{"id":2407,"date":"2019-07-29T06:39:49","date_gmt":"2019-07-29T06:39:49","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/07\/29\/decision-tree-vs-random-forest-vs-gradient-boosting-machines-explained-simply\/"},"modified":"2019-07-29T06:39:49","modified_gmt":"2019-07-29T06:39:49","slug":"decision-tree-vs-random-forest-vs-gradient-boosting-machines-explained-simply","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/07\/29\/decision-tree-vs-random-forest-vs-gradient-boosting-machines-explained-simply\/","title":{"rendered":"Decision Tree vs Random Forest vs Gradient Boosting Machines: Explained Simply"},"content":{"rendered":"<p>Author: Stephanie Glen<\/p>\n<div>\n<p><a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/decision-tree-definition-and-examples\/\" target=\"_blank\" rel=\"noopener noreferrer\">Decision Trees<\/a>, <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/random-forests-explained-intuitively\" target=\"_blank\" rel=\"noopener noreferrer\">Random Forests<\/a> and <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/boosting-algorithms-for-better-predictions\" target=\"_blank\" rel=\"noopener noreferrer\">Boosting<\/a>\u00a0are among the <a href=\"https:\/\/www.kdnuggets.com\/2017\/12\/top-data-science-machine-learning-methods.html\" target=\"_blank\" rel=\"noopener noreferrer\">top 16 data science and machine learning tools<\/a> used by data scientists. The three methods are similar, with a significant amount of overlap. In a nutshell:<\/p>\n<ul>\n<li>A <strong>decision tree<\/strong> is a simple, decision making-diagram.<\/li>\n<li><strong>Random forests<\/strong> are a large number of trees, combined (using averages or &#8220;majority rules&#8221;) at the end of the process.<\/li>\n<li><strong>Gradient boosting machines<\/strong> also combine decision trees, but start the combining process at the beginning, instead of at the end.<\/li>\n<\/ul>\n<h2>Decision Trees and Their Problems<\/h2>\n<p><strong>Decision trees<\/strong> are a series of sequential steps designed to answer a question and provide probabilities, costs, or other consequence of making a particular decision.<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3389672795?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3389672795?profile=RESIZE_710x\" class=\"align-full\"><\/a><\/p>\n<p><a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/decision-tree-definition-and-examples\/\" target=\"_blank\" rel=\"noopener noreferrer\"><\/a>They are simple to understand, providing a clear visual to guide the decision making progress. However, this simplicity comes with a few <em>serious disadvantages<\/em>, including <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/regression-analysis\/#overfitting\" target=\"_blank\" rel=\"noopener noreferrer\">overfitting<\/a>,\u00a0<span>error due to <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/what-is-bias\/\" target=\"_blank\" rel=\"noopener noreferrer\">bias<\/a>\u00a0and error due to <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/probability-and-statistics\/variance\/\" target=\"_blank\" rel=\"noopener noreferrer\">variance<\/a>.\u00a0\u00a0<\/span><\/p>\n<ul>\n<li><span><strong>Overfitting<\/strong> happens for many reasons, including presence of <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/statistical-noise\/\" target=\"_blank\" rel=\"noopener noreferrer\">noise<\/a>\u00a0and lack of representative instances. It&#8217;s possible for overfitting with one large (deep) tree.\u00a0<\/span><\/li>\n<li><strong>Bias error<\/strong> happens when you place too many restrictions on target functions. For example, restricting your result with a restricting function (e.g. a <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/linear-relationship\/#lineq\" target=\"_blank\" rel=\"noopener noreferrer\">linear equation<\/a>) or by a simple binary algorithm (like the true\/false choices in the above tree) will often result in bias.<\/li>\n<li><strong>Variance error<\/strong> refers to how much a result will change based on changes to the training set. Decision trees have high variance, which means that tiny changes in the training data have the potential to cause large changes in the final result.<\/li>\n<\/ul>\n<h2>Random Forest vs Decision Trees<\/h2>\n<p>As noted above, decision trees are fraught with problems. A tree generated from 99 data points might differ significantly from a tree generated with just one different data point. If there was a way to generate a very large number of trees, averaging out their solutions, then you&#8217;ll likely get an answer that is going to be very close to the true answer. Enter the\u00a0<strong>random forest<\/strong>\u2014a collection of decision trees with a single, aggregated result. Random forests\u00a0are commonly reported as the most accurate learning algorithm.\u00a0<\/p>\n<p>Random forests reduce the variance seen in decision trees by:<\/p>\n<ol>\n<li><span>Using different samples for training,<\/span><\/li>\n<li><span>Specifying random feature subsets,\u00a0<\/span><\/li>\n<li><span>Building and combining small (shallow) trees.<\/span><\/li>\n<\/ol>\n<p class=\"ui_qtext_para u-ltr u-text-align--start\">A single decision tree is a weak predictor, but is relatively fast to build. More trees give you a more <a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/robust-statistics\/\" target=\"_blank\" rel=\"noopener noreferrer\">robust<\/a>\u00a0model and prevent overfitting. However, <strong>the more trees you have, the slower the process.<\/strong> Each tree in the forest has to be generated, processed, and analyzed. In addition, the more features you have, the slower the process (which can sometimes take <a href=\"https:\/\/stackoverflow.com\/questions\/23075506\/how-to-improve-randomforest-performance\" target=\"_blank\" rel=\"noopener noreferrer\">hours<\/a>\u00a0or even <a href=\"https:\/\/github.com\/haifengl\/smile\/issues\/257\" target=\"_blank\" rel=\"noopener noreferrer\">days<\/a>); Reducing the set of features can dramatically speed up the process.<\/p>\n<p class=\"ui_qtext_para u-ltr u-text-align--start\">Another distinct difference between a decision tree and random forest is that while a decision tree is easy to read\u2014you just follow the path and find a result\u2014a random forest is a tad more <strong>complicated to interpret.<\/strong> There are a slew of articles out there designed to help you read the results from random forests (like <a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2018\/10\/interpret-random-forest-model-machine-learning-programmers\/\" target=\"_blank\" rel=\"noopener noreferrer\">this one<\/a>), but in comparison to decision trees, the learning curve is steep.<\/p>\n<h2>Random Forest vs Gradient Boosting<\/h2>\n<p>Like random forests, <a href=\"https:\/\/towardsdatascience.com\/understanding-gradient-boosting-machines-9be756fe76ab\" target=\"_blank\" rel=\"noopener noreferrer\">gradient boosting<\/a> is a set of decision trees. The two main differences are:<\/p>\n<ol>\n<li><strong>How trees are built:<\/strong> random forests builds each tree independently while\u00a0gradient boosting builds one tree at a time. This additive model (ensemble) works in a forward\u00a0stage-wise manner, introducing a\u00a0weak learner to i<a href=\"http:\/\/www.ccs.neu.edu\/home\/vip\/teach\/MLcourse\/4_boosting\/slides\/gradient_boosting.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">mprove the\u00a0shortcomings of existing weak learners<\/a>.\u00a0<\/li>\n<li><strong>Combining results<\/strong>: random forests combine results at the end of the process (by averaging or &#8220;majority rules&#8221;) while gradient boosting combines results along the way.<\/li>\n<\/ol>\n<p><span>If you carefully tune parameters, gradient boosting can result in <strong>better performance<\/strong> than random forests. However, <strong>gradient boosting may not be a good choice if you have a lot of noise<\/strong>, as it can result in overfitting. They also tend to be <strong>harder to tune<\/strong> than random forests.<\/span><\/p>\n<p><span>Random forests and gradient boosting each excel in different areas. Random forests perform well for\u00a0<a href=\"http:\/\/www.svcl.ucsd.edu\/publications\/conference\/2014\/nips2014.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">multi-class object detection<\/a>\u00a0and <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2648734\/?source=post_page---------------------------\" target=\"_blank\" rel=\"noopener noreferrer\">bioinformatics,<\/a>\u00a0which tends to have a lot of statistical noise. Gradient Boosting performs well when you have unbalanced data such as in <a href=\"https:\/\/www.academia.edu\/7707785\/Application_of_Stochastic_Gradient_Boosting_SGB_Technique_to_Enhance_the_Reliability_of_Real-Time_Risk_Assessment_Using_AVI_and_RTMS_Data\" target=\"_blank\" rel=\"noopener noreferrer\">real time risk assessment.<\/a><\/span><\/p>\n<h2>References<\/h2>\n<p><a href=\"https:\/\/www.kdnuggets.com\/2017\/12\/top-data-science-machine-learning-methods.html\" target=\"_blank\" rel=\"noopener noreferrer\">Top Data Science and Machine Learning Methods Used in 2017<\/a><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/random-forests-explained-intuitively\" target=\"_blank\" rel=\"noopener noreferrer\">Random Forests explained intuitively<\/a><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/boosting-algorithms-for-better-predictions\" target=\"_blank\" rel=\"noopener noreferrer\">Boosting Algorithms for Better Predictions<\/a><\/p>\n<p><a href=\"http:\/\/protocols.netlab.uky.edu\/~liuj\/teaching\/CS485g\/notes\/Reference\/24%20-%20Decision%20Trees%20-%20overfitting.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Overfitting in Decision Trees<\/a><\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/gentle-introduction-to-the-bias-variance-trade-off-in-machine-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\">Gentle Introduction to the Bias-Variance Trade-Off in Machine Learning<\/a><\/p>\n<p><a href=\"https:\/\/stackoverflow.com\/questions\/23075506\/how-to-improve-randomforest-performance\" target=\"_blank\" rel=\"noopener noreferrer\">How to improve random Forest performance?<\/a><\/p>\n<p><a href=\"https:\/\/github.com\/haifengl\/smile\/issues\/257\" target=\"_blank\" rel=\"noopener noreferrer\">Training a Random Forest with a big dataset seems very slow #257<\/a><\/p>\n<p><a href=\"http:\/\/pages.cs.wisc.edu\/~matthewb\/pages\/notes\/pdf\/ensembles\/RandomForests.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Random Forests<\/a><\/p>\n<p><a href=\"https:\/\/medium.com\/@aravanshad\/gradient-boosting-versus-random-forest-cfa3fa8f0d80\" target=\"_blank\" rel=\"noopener noreferrer\">Gradient Boosting vs Random Forests<\/a><\/p>\n<p><span><a href=\"http:\/\/www.svcl.ucsd.edu\/publications\/conference\/2014\/nips2014.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Multi-class object detection<\/a><\/span><\/p>\n<p><span><a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2648734\/?source=post_page---------------------------\" target=\"_blank\" rel=\"noopener noreferrer\">Using random forest for reliable classification and cost-sensitive learning for medical diagnosis<\/a><\/span><\/p>\n<p><span><a href=\"https:\/\/towardsdatascience.com\/understanding-gradient-boosting-machines-9be756fe76ab\" target=\"_blank\" rel=\"noopener noreferrer\">Applications of Gradient Boosting Machines<\/a><\/span><\/p>\n<p><span><a href=\"http:\/\/www.ccs.neu.edu\/home\/vip\/teach\/MLcourse\/4_boosting\/slides\/gradient_boosting.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">A Gentle Introduction to Gradient Boosting<\/a><\/span><\/p>\n<p><span><a href=\"https:\/\/www.academia.edu\/7707785\/Application_of_Stochastic_Gradient_Boosting_SGB_Technique_to_Enhance_the_Reliability_of_Real-Time_Risk_Assessment_Using_AVI_and_RTMS_Data\" target=\"_blank\" rel=\"noopener noreferrer\">Application of Stochastic Gradient Boosting (SGB) Technique to Enhance the Reliability of Real-Time Risk Assessment Using AVI and RTMS Data<\/a><\/span><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:862337\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Stephanie Glen Decision Trees, Random Forests and Boosting\u00a0are among the top 16 data science and machine learning tools used by data scientists. The three [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/07\/29\/decision-tree-vs-random-forest-vs-gradient-boosting-machines-explained-simply\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":468,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2407"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2407"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2407\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/467"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2407"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2407"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2407"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}