{"id":2353,"date":"2019-07-11T06:35:40","date_gmt":"2019-07-11T06:35:40","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/07\/11\/comparing-model-evaluation-techniques-part-1-statistical-tools-tests\/"},"modified":"2019-07-11T06:35:40","modified_gmt":"2019-07-11T06:35:40","slug":"comparing-model-evaluation-techniques-part-1-statistical-tools-tests","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/07\/11\/comparing-model-evaluation-techniques-part-1-statistical-tools-tests\/","title":{"rendered":"Comparing Model Evaluation Techniques Part 1: Statistical Tools &amp; Tests"},"content":{"rendered":"<p>Author: Stephanie Glen<\/p>\n<div>\n<p>Evaluating a model is just as important as creating the model in the first place. Even if you use the most statistically sound tools to create your model, the end result may not be what you expected. Which metric you use to test your model depends on the type of data you\u2019re working with and your comfort level with statistics.<\/p>\n<p>Model evaluation techniques answer <strong>three main questions:<\/strong><\/p>\n<ol>\n<li>How well does your model match your data (in other words, what is the <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/goodness-of-fit-test\/\" target=\"_blank\" rel=\"noopener noreferrer\">goodness of fit<\/a>)?<\/li>\n<li>\u00a0Assuming you&#8217;ve created multiple models, which one is the best? (Note that the &#8220;best&#8221; can have different criteria according to situation\/personal preferences; What is best for one situation may not be best for another.)<\/li>\n<li>Will your model predict new observations for your data set with accuracy?<\/li>\n<\/ol>\n<p>The following summary of model evaluation techniques is by no means exhaustive; it\u2019s intended to be a starting point if you\u2019re unfamiliar with the available techniques. In part 1, I discuss some of the common Statistical Tools and Tests. Part 2 will cover tools for Clustering and Classification.<\/p>\n<h2>BASIC STATISTICAL TOOLS<\/h2>\n<p><strong>1. Confidence Interval<\/strong><\/p>\n<p>A <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/confidence-interval\/\" target=\"_blank\" rel=\"noopener noreferrer\">confidence interval<\/a> is a measure of how <strong>reliable a statistical estimate<\/strong> is. \u00a0For example, you might have calculated a <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/statistics-definitions\/mean-median-mode\/\" target=\"_blank\" rel=\"noopener noreferrer\">mean\u00a0<\/a>or <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/standard-deviation\/\" target=\"_blank\" rel=\"noopener noreferrer\">standard deviation<\/a> for your set. But how reliable is that estimate? A confidence interval will give you an easy-to-understand boundary. For example, if your calculated mean is 10 feet, a confidence interval might tell you that 99% of results will fall between 9 feet and 11 feet (i.e. 1 foot either side of the mean).<\/p>\n<p><strong>2. Root Mean Square Error (RMSE)<\/strong><\/p>\n<p>RMSE is a measure of how spread out <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/residual\/\" target=\"_blank\" rel=\"noopener noreferrer\">residuals<\/a>\u00a0are. In other words, it tells you how concentrated the data is around the\u00a0<a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/line-of-best-fit\/\">line of best fit<\/a>.\u00a0. It\u2019s one of the most popular metrics for evaluating continuous data, and is widely used in Excel. However, it\u2019s a lot trickier to understand than simple statistics like confidence intervals because of the complex calculations involved. It also penalizes higher differences, meaning that it\u2019s sensitive to <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/find-outliers\/\" target=\"_blank\" rel=\"noopener noreferrer\">outliers<\/a>. Even if you don\u2019t understand the calculations behind RMSE, Excel will still spit out an answer, leaving you to puzzle over the significance of the result. To further the complications, RMSE (and other similar metrics like <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/what-is-bias\/\" target=\"_blank\" rel=\"noopener noreferrer\">bias<\/a>\u00a0and <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/correlation-analysis\/\" target=\"_blank\" rel=\"noopener noreferrer\">correlation coefficients)<\/a> can really only be fully understood if you are very familiar with the underlying data\/model.<\/p>\n<p>\u00a0<strong>L^1 version of RSME<\/strong>.<\/p>\n<p>RSME is sensitive to outliers, one reason why it\u2019s fallen out of favor with many data scientists. An alternative is a more modern version, like the L^1 version<a href=\"http:\/\/www.analyticbridge.com\/profiles\/blogs\/correlation-and-r-squared-for-big-data\">\u00a0described here<\/a>. \u00a0<\/p>\n<p>\u00a0<\/p>\n<h2>STATISTICAL TESTS<\/h2>\n<p>While basic statistical tools (like those listed above) are fairly easy to understand for the non-statistician, delving into the arena of statistical tests requires you to have a lot more in-depth knowledge not only of statistics, but of your data and model. The major advantages to running a statistical test is that it gives you great confidence in your results. For many professional arenas and publications, statistical tests are an absolute must. The downside is that you really have to know your data inside and out in order to interpret the results from these tests; \u00a0otherwise, it\u2019s easy to misinterpret them.<\/p>\n<p><strong><u>1. Kolmogorov-Smirnov Test<\/u><\/strong><u>.<\/u><\/p>\n<p>The <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/kolmogorov-smirnov-test\/\" target=\"_blank\" rel=\"noopener noreferrer\">Kolmogorov-Smirnov Goodness of Fit Test (K-S test)<\/a> is a <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/parametric-and-non-parametric-data\/\" target=\"_blank\" rel=\"noopener noreferrer\">distribution free test<\/a>\u00a0that compares your data with a known distribution (usually a <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/normal-distributions\/\" target=\"_blank\" rel=\"noopener noreferrer\">normal distribution<\/a>) and lets you know if they have the same distribution. The fact that you don\u2019t have to know the underlying distribution is a great advantage, but the test has several drawbacks, including the fact that you have to specify the\u00a0<a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/location-parameter\/\" style=\"font-style: inherit; font-weight: inherit;\">location<\/a>,\u00a0<a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/scale-parameter\/\" style=\"font-style: inherit; font-weight: inherit;\">scale<\/a>, and\u00a0<a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/shape-parameter\/\" style=\"font-style: inherit; font-weight: inherit;\">shape<\/a>\u00a0parameters; These cannot be estimated from the data, as it will invalidate the test. Another big disadvantage is that it can\u2019t be usually be used for <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/discrete-vs-continuous-variables\/\" target=\"_blank\" rel=\"noopener noreferrer\">discrete data<\/a>\u00a0without some particularly cumbersome calculations or an add-on for software.<\/p>\n<p><strong>\u00a02. Lilliefors Test<\/strong><\/p>\n<p><a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/lilliefors-test\/\">Lilliefors test<\/a> is\u00a0 a corrected version of the<strong>\u00a0<\/strong>K-S test for normality, generally gives a more accurate approximation of the test statistic\u2019s distribution. This is especially true if you don\u2019t know the population mean and standard deviation (which is usually the case). Many statistical packages (like SPSS) combine the two tests as a \u201cLilliefors corrected\u201d K-S test.<\/p>\n<p><strong><i>3.\u00a0<\/i><\/strong><u style=\"font-weight: bold;\">Chi Square<\/u><u>.<\/u><\/p>\n<p>The Chi-Square test is similar to Kolmogorov-Smirnov, but it is a <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/parametric-statistics\/\" target=\"_blank\" rel=\"noopener noreferrer\">parametric test<\/a>. This means that you have to know the underlying distribution in order to work with it. While the K-S test performs poorly when you estimate population parameters, chi-square can be successfully run with estimations. That said, a downside is that it doesn\u2019t work well with small sample sizes.<\/p>\n<p>In general, K-S is usually preferred for it\u2019s higher <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/statistical-power\/\" target=\"_blank\" rel=\"noopener noreferrer\">power<\/a>. However, when you don\u2019t know a certain population parameter (i.e. the one you&#8217;re trying to estimate, like the mean), chi-square may be the better choice.<\/p>\n<\/p>\n<h2>References<\/h2>\n<p><a href=\"https:\/\/pdfs.semanticscholar.org\/0c3a\/736708f50897690875b6e578e62657df437b.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Comparison of the Goodness-of-Fit Tests: the Pearson Chi-square and Kolmogorov-Smirnov Tests<\/a><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/7-important-model-evaluation-error-metrics-everyone-should-know\" target=\"_blank\" rel=\"noopener noreferrer\">11 Important Model Evaluation Techniques Everyone Should Know<\/a><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:835160\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Stephanie Glen Evaluating a model is just as important as creating the model in the first place. Even if you use the most statistically [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/07\/11\/comparing-model-evaluation-techniques-part-1-statistical-tools-tests\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":465,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2353"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2353"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2353\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/465"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2353"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2353"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2353"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}