{"id":2396,"date":"2019-07-25T06:34:38","date_gmt":"2019-07-25T06:34:38","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/07\/25\/comparing-model-evaluation-techniques-part-3-regression-models\/"},"modified":"2019-07-25T06:34:38","modified_gmt":"2019-07-25T06:34:38","slug":"comparing-model-evaluation-techniques-part-3-regression-models","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/07\/25\/comparing-model-evaluation-techniques-part-3-regression-models\/","title":{"rendered":"Comparing Model Evaluation Techniques Part 3: Regression Models"},"content":{"rendered":"<p>Author: Stephanie Glen<\/p>\n<div>\n<p style=\"text-align: left;\">In my previous posts, I compared model evaluation techniques using <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/comparing-model-evaluation-techniques\">Statistical Tools &#038; Tests<\/a> and commonly used <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/comparing-model-evaluation-techniques-part-2\">Classification and Clustering evaluation techniques<\/a><\/p>\n<p>In this post, I&#8217;ll take a look at how you can compare regression models. Comparing regression models is perhaps one of the trickiest tasks to complete in the &#8220;comparing models&#8221; arena; The reason is that there are literally <em>dozens<\/em> of statistics you can calculate to compare regression models, including:<\/p>\n<p>1. <strong>Error measures in the estimation period<\/strong> (in-sample testing) or <strong>validation period<\/strong> (out-of-sample testing):<\/p>\n<ul>\n<li><a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/absolute-error\/\" target=\"_blank\" rel=\"noopener noreferrer\">Mean Absolute Error (MAE),<\/a><\/li>\n<li><a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/mean-absolute-percentage-error-mape\/\" target=\"_blank\" rel=\"noopener noreferrer\">Mean Absolute Percentage Error (MAPE),<\/a><\/li>\n<li><a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/mean-error\/\" target=\"_blank\" rel=\"noopener noreferrer\">Mean Error,<\/a><\/li>\n<li><a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/rmse\/\" target=\"_blank\" rel=\"noopener noreferrer\">Root Mean Squared Error (RMSE),<\/a><\/li>\n<\/ul>\n<p><strong>2. Tests on Residuals and Goodness-of-Fit:<\/strong><\/p>\n<ul>\n<li><strong>Plots<\/strong>: actual vs. predicted value; cross correlation; <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/residual\/\" target=\"_blank\" rel=\"noopener noreferrer\">residual<\/a> <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/serial-correlation-autocorrelation\/\" target=\"_blank\" rel=\"noopener noreferrer\">autocorrelation<\/a>; residuals vs. time\/predicted values,<\/li>\n<li><strong>Changes<\/strong> in <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/statistics-definitions\/mean-median-mode\/#mean\" target=\"_blank\" rel=\"noopener noreferrer\">mean<\/a>\u00a0or <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/variance\/\" target=\"_blank\" rel=\"noopener noreferrer\">variance<\/a>,<\/li>\n<li><strong>Tests<\/strong>: <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/normal-probability-plot\/\" target=\"_blank\" rel=\"noopener noreferrer\">normally distributed errors<\/a>; excessive runs (e.g. of positives or negatives); <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/find-outliers\/\" target=\"_blank\" rel=\"noopener noreferrer\">outliers<\/a>\/extreme values\/ influential observations.<\/li>\n<\/ul>\n<p>This list isn&#8217;t exhaustive&#8211;there are many other tools, tests and plots at your disposal. Rather than discuss the statistics in detail, I chose to focus this post on\u00a0<strong>comparing a few of the most popular regression model evaluation techniques<\/strong> and discuss when you might want to use them (or when you might <em>not<\/em> want to). The techniques listed below tend to be on the &#8220;easier to use and understand&#8221; end of the spectrum, so if you&#8217;re new to model comparison it&#8217;s a good place to start.<\/p>\n<h2>Where to Start<\/h2>\n<p>The first question you should be asking is: <em>How well do I know my data?<\/em> In order to evaluate regression models, you need to know what results would be reasonable for your particular situation. For example, if you compare changes in mean or variance, one model might give you impossible results, another might be overly complicated for the task at hand. The ideal model isn&#8217;t one that&#8217;s just &#8220;correct&#8221;, it also needs to be relatively simple and useful for the decision making process&#8211;something that won&#8217;t be immediately obvious unless you know your data really well.<\/p>\n<p>Which technique you choose is largely dependent on the software you have at hand (i.e. R, SPSS, or Excel).\u00a0If you&#8217;re using Excel, a word of advice: <strong>stop<\/strong>. It was never designed for serious statistical work and has <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/excel-statistics\/#tips\" target=\"_blank\" rel=\"noopener noreferrer\">significant statistical problems<\/a>. <a href=\"https:\/\/people.duke.edu\/~rnau\/411regou.htm\" target=\"_blank\" rel=\"noopener noreferrer\">Duke University&#8217;s Robert Nau<\/a> puts it best: &#8220;<i><span>It&#8217;s a toy (a clumsy one at that), not a tool for serious work.&#8221;\u00a0<\/span><\/i> The number of models you&#8217;re testing also comes into play. Arguably, which statistic you use (<a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/coefficient-of-determination-r-squared\/\" target=\"_blank\" rel=\"noopener noreferrer\">r-squared<\/a>, <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/p-value\/\" target=\"_blank\" rel=\"noopener noreferrer\">p-values<\/a> etc.) are mostly personal preference (although the test you use might force that choice upon you). Each of the statistics has it&#8217;s pluses and minuses, its advantages and disadvantages. I won&#8217;t be bloating this article out with all the comparisons between the statistics, but if you&#8217;re interested I&#8217;ve linked where possible to articles that explain those in detail.<\/p>\n<h2>Nested Models<\/h2>\n<p><a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/nested-model-anova-factors\/#model\" target=\"_blank\" rel=\"noopener noreferrer\">Nested models<\/a> are models that are subsets of one another; If you can get one model by constraining the parameters of another, then that model is nested. Nested models require different techniques to evaluate models and there isn&#8217;t a single, agreed-upon way to test for the &#8220;best&#8221; model.<\/p>\n<p>Possibly the <strong>easiest<\/strong> (it can be used with a very basic understanding of statistics) way to compare nested models is to simply <a href=\"https:\/\/psych.unl.edu\/psycrs\/statpage\/ldfnonnested.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">measure how well each model performs reclassification<\/a>. The &#8220;better&#8221; model will have higher rates of correct reclassification. A <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/chi-square\/\" target=\"_blank\" rel=\"noopener noreferrer\">chi-square<\/a>\u00a0analysis can be used, although if you run a test for <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/hypothesis-testing\/anova\/#sphericity\" target=\"_blank\" rel=\"noopener noreferrer\">sphericity<\/a>\u00a0you must use a different chi-square value.\u00a0<\/p>\n<p>If you&#8217;re comparing nested models (perhaps you want to know if the simplest model is adequate), you can compare them with a <strong><a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/t-statistic\/\" target=\"_blank\" rel=\"noopener noreferrer\">t-statistic<\/a><\/strong>. You can only run a test for <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/what-is-statistical-significance\/\" target=\"_blank\" rel=\"noopener noreferrer\">significance<\/a>\u00a0against a single extra <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/regression-analysis\/find-a-linear-regression-equation\/#linregCoefficient\" target=\"_blank\" rel=\"noopener noreferrer\">coefficient<\/a>. In other words, you can&#8217;t run it if you have more than one additional coefficient from one model to the next. <a href=\"https:\/\/www.stat.ncsu.edu\/people\/bloomfield\/courses\/st370\/Slides\/MandR-ch12-sec02-06.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">This article<\/a> has instructions in R, as well as a fairly detailed overview on running the\u00a0<em>general regression test<\/em>\u00a0or the <em>extra\u00a0sum of squares test.<\/em> \u00a0<\/p>\n<p>According to <a href=\"https:\/\/psych.unl.edu\/psycrs\/statpage\/rhtest_eg1.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Calvin Garbin of the University of Nebraska Lincoln,<\/a> with <strong>SPSS<\/strong> you can compare nested models in two different ways using <strong><a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/coefficient-of-determination-r-squared\/\" target=\"_blank\" rel=\"noopener noreferrer\">r-squared<\/a>:<\/strong><\/p>\n<ul>\n<li>Get the multiple regression results for each model, then compare the models using the FZT Computator&#8217;s\u00a0<em>R\u00b2<\/em><br \/><em>change F-test<\/em>.<\/li>\n<li>Change from one model to another in SPSS, calculating the <em>R\u00b2-change F-test.<\/em> Although convenient, this doesn&#8217;t always calculate the statistic correctly.<\/li>\n<\/ul>\n<p>Gabin&#8217;s article has a couple of excellent examples of how to perform the above tasks as well as SPSS procedures for comparing non-nested models using <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/correlation-analysis\/\" target=\"_blank\" rel=\"noopener noreferrer\">correlations<\/a>.<\/p>\n<p>An <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/hypothesis-testing\/anova\/\" target=\"_blank\" rel=\"noopener noreferrer\">ANOVA\u00a0<\/a>F-test can compare two nested models, where one is a subset of the other. It tests a single predictor variable, but can be used to\u00a0test multiple predictors at a time.\u00a0<\/p>\n<p><strong>Multiple models<\/strong> can be compared using <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/forward-selection\/\" target=\"_blank\" rel=\"noopener noreferrer\">forward selection<\/a>, backward elimination, or stepwise selection. Basically, these are all variants of each other\u00a0and involve\u00a0removing predictors with the smallest <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/f-statistic-value-test\/\" target=\"_blank\" rel=\"noopener noreferrer\">f-value<\/a> \/ t-value or\u00a0largest associated <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/p-value\/\" target=\"_blank\" rel=\"noopener noreferrer\">p-value<\/a>. These techniques can only be used on nested models, but they can all miss optimal models and&#8211;if you run all three on the same models&#8211;they may not agree with each other.<\/p>\n<\/p>\n<h2>Non Nested Models<\/h2>\n<p>Non nested models have fewer options for comparison between models. As the models aren&#8217;t nested, neither will your results (e.g. a <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/chi-square\/\" target=\"_blank\" rel=\"noopener noreferrer\">chi-square statistic<\/a>). In layman&#8217;s terms, if your models are nested then you&#8217;re comparing apples to apples, which is much easier than comparing apples to oranges.<\/p>\n<p><span>One of the simplest comparison methods is the <strong><a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/bayesian-information-criterion\/\" target=\"_blank\" rel=\"noopener noreferrer\">Bayesian information criterion.<\/a><\/strong>\u00a0Despite the daunting math behind the calculations, most statistical software will calculate<\/span><strong>\u00a0<\/strong>the BIC for each model.<strong><span>\u00a0<\/span><\/strong>This leaves you to simply interpret the results:<strong><span>\u00a0<\/span><\/strong><span>The model with the lowest BIC is considered the best. It&#8217;s often preferred over other Bayesian methods like <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/bayes-factor-definition\/\" target=\"_blank\" rel=\"noopener noreferrer\">Bayes Factors<\/a>, because BIC doesn&#8217;t require you to have knowledge about <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/prior-probability-uniformative-conjugate\/\" target=\"_blank\" rel=\"noopener noreferrer\">priors<\/a>.\u00a0<\/span><\/p>\n<p><span><a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/akaikes-information-criterion\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>Akaike\u2019s Information Criterion<\/strong><\/a>\u00a0is similar to BIC, except that the BIC tends to favor models with fewer parameters. AIC ranks each model\u00a0from best to worst. A major downside is that it doesn&#8217;t say anything about <em>quality<\/em>; It will choose the &#8220;best&#8221; even if you input a series of poor quality models.\u00a0<\/span><\/p>\n<p>The benefit of the <strong><a href=\"https:\/\/rdrr.io\/cran\/lmtest\/man\/coxtest.html\" target=\"_blank\" rel=\"noopener noreferrer\">Cox test<\/a><\/strong>\u00a0is it&#8217;s relatively simple (in comparison to the BIC or AIC) to understand what the test is doing behind the scenes.\u00a0 Let&#8217;s say you were comparing models A and B. If model A contains the the correct regressors, then those regressors fit from model B to model A should yield zero further explanatory value. If there is further explanatory value, then model 1 doesn&#8217;t contain the correct regressor set. You run the test twice&#8211;the second time from B to A&#8211;and compare your findings.\u00a0<a href=\"https:\/\/rdrr.io\/cran\/lmtest\/man\/coxtest.html\" target=\"_blank\" rel=\"noopener noreferrer\"><\/a><a href=\"https:\/\/rdrr.io\/cran\/lmtest\/man\/coxtest.html\" target=\"_blank\" rel=\"noopener noreferrer\">See: Performing the Cox Test in R.<\/a><\/p>\n<\/p>\n<h2>References<\/h2>\n<p><a href=\"https:\/\/people.duke.edu\/~rnau\/compare.htm\" target=\"_blank\" rel=\"noopener noreferrer\">Comparing Models<\/a><\/p>\n<p><a href=\"https:\/\/www.stat.ncsu.edu\/people\/bloomfield\/courses\/st370\/Slides\/MandR-ch12-sec02-06.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Comparing Nested Models<\/a><\/p>\n<p><a href=\"http:\/\/pages.stat.wisc.edu\/~ane\/st572\/notes\/lec05.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Outline: Significance Testing<\/a><\/p>\n<p><a href=\"https:\/\/people.duke.edu\/~rnau\/411regou.htm\" target=\"_blank\" rel=\"noopener noreferrer\">Linear Regression Models<\/a><\/p>\n<p><a href=\"https:\/\/web.stanford.edu\/~doubleh\/eco273B\/nonnestmsc.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Non nested model selection criteria<\/a><\/p>\n<p><a href=\"http:\/\/math.furman.edu\/~dcs\/courses\/math47\/R\/library\/lmtest\/html\/coxtest.html\" target=\"_blank\" rel=\"noopener noreferrer\">Cox test for comparing non nested models<\/a><\/p>\n<p><a href=\"https:\/\/psych.unl.edu\/psycrs\/statpage\/ldfnonnested.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Quiz #3 Research Hypotheses that Involve Comparing Non-Nested Models<\/a><\/p>\n<p><span>R. Davidson &#038; J. MacKinnon (1981). Several Tests for Model Specification in the Presence of Alternative Hypotheses.\u00a0<\/span><em>Econometrica<\/em><span>,\u00a0<\/span><b>49<\/b><span>, 781-793.<\/span><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:860502\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Stephanie Glen In my previous posts, I compared model evaluation techniques using Statistical Tools &#038; Tests and commonly used Classification and Clustering evaluation techniques [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/07\/25\/comparing-model-evaluation-techniques-part-3-regression-models\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":457,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2396"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2396"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2396\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/457"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2396"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2396"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2396"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}