{"id":3397,"date":"2020-04-30T19:00:16","date_gmt":"2020-04-30T19:00:16","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2020\/04\/30\/how-to-develop-an-adaboost-ensemble-in-python\/"},"modified":"2020-04-30T19:00:16","modified_gmt":"2020-04-30T19:00:16","slug":"how-to-develop-an-adaboost-ensemble-in-python","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2020\/04\/30\/how-to-develop-an-adaboost-ensemble-in-python\/","title":{"rendered":"How to Develop an AdaBoost Ensemble in Python"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Boosting is a class of ensemble machine learning algorithms that involve combining the predictions from many weak learners.<\/p>\n<p>A weak learner is a model that is very simple, although has some skill on the dataset. Boosting was a theoretical concept long before a practical algorithm could be developed, and the AdaBoost (adaptive boosting) algorithm was the first successful approach for the idea.<\/p>\n<p>The AdaBoost algorithm involves using very short (one-level) decision trees as weak learners that are added sequentially to the ensemble. Each subsequent model attempts to correct the predictions made by the model before it in the sequence. This is achieved by weighing the training dataset to put more focus on training examples on which prior models made prediction errors.<\/p>\n<p>In this tutorial, you will discover how to develop AdaBoost ensembles for classification and regression.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>AdaBoost ensemble is an ensemble created from decision trees added sequentially to the model<\/li>\n<li>How to use the AdaBoost ensemble for classification and regression with scikit-learn.<\/li>\n<li>How to explore the effect of AdaBoost model hyperparameters on model performance.<\/li>\n<\/ul>\n<p>Let&rsquo;s get started.<\/p>\n<div id=\"attachment_10255\" style=\"width: 809px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10255\" class=\"size-full wp-image-10255\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/05\/How-to-Develop-an-AdaBoost-Ensemble-in-Python.jpg\" alt=\"How to Develop an AdaBoost Ensemble in Python\" width=\"799\" height=\"470\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/How-to-Develop-an-AdaBoost-Ensemble-in-Python.jpg 799w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/How-to-Develop-an-AdaBoost-Ensemble-in-Python-300x176.jpg 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/How-to-Develop-an-AdaBoost-Ensemble-in-Python-768x452.jpg 768w\" sizes=\"(max-width: 799px) 100vw, 799px\"><\/p>\n<p id=\"caption-attachment-10255\" class=\"wp-caption-text\">How to Develop an AdaBoost Ensemble in Python<br \/>Photo by <a href=\"https:\/\/flickr.com\/photos\/rayinmanila\/25223762580\/\">Ray in Manila<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>AdaBoost Ensemble Algorithm<\/li>\n<li>AdaBoost Scikit-Learn API\n<ol>\n<li>AdaBoost for Classification<\/li>\n<li>AdaBoost for Regression<\/li>\n<\/ol>\n<\/li>\n<li>AdaBoost Hyperparameters\n<ol>\n<li>Explore Number of Trees<\/li>\n<li>Explore Weak Learner<\/li>\n<li>Explore Learning Rate<\/li>\n<li>Explore Alternate Algorithm<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<h2>AdaBoost Ensemble Algorithm<\/h2>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Boosting_(machine_learning)\">Boosting<\/a> refers to a class of machine learning ensemble algorithms where models are added sequentially and later models in the sequence correct the predictions made by earlier models in the sequence.<\/p>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/AdaBoost\">AdaBoost<\/a>, short for &ldquo;<em>Adaptive Boosting<\/em>,&rdquo; is a boosting ensemble machine learning algorithm, and was one of the first successful boosting approaches.<\/p>\n<blockquote>\n<p>We call the algorithm AdaBoost because, unlike previous algorithms, it adjusts adaptively to the errors of the weak hypotheses<\/p>\n<\/blockquote>\n<p>&mdash; <a href=\"https:\/\/link.springer.com\/chapter\/10.1007\/3-540-59119-2_166\">A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting<\/a>, 1996.<\/p>\n<p>AdaBoost combines the predictions from short one-level decision trees, called decision stumps, although other algorithms can also be used. Decision stump algorithms are used as the AdaBoost algorithm seeks to use many weak models and correct their predictions by adding additional weak models.<\/p>\n<p>The training algorithm involves starting with one decision tree, finding those examples in the training dataset that were misclassified, and adding more weight to those examples. Another tree is trained on the same data, although now weighted by the misclassification errors. This process is repeated until a desired number of trees are added.<\/p>\n<blockquote>\n<p>If a training data point is misclassified, the weight of that training data point is increased (boosted). A second classifier is built using the new weights, which are no longer equal. Again, misclassified training data have their weights boosted and the procedure is repeated.<\/p>\n<\/blockquote>\n<p>&mdash; <a href=\"https:\/\/www.intlpress.com\/site\/pub\/pages\/journals\/items\/sii\/content\/vols\/0002\/0003\/a008\/\">Multi-class AdaBoost<\/a>, 2009.<\/p>\n<p>The algorithm was developed for classification and involves combining the predictions made by all decision trees in the ensemble. A similar approach was also developed for regression problems where predictions are made by using the average of the decision trees. The contribution of each model to the ensemble prediction is weighted based on the performance of the model on the training dataset.<\/p>\n<blockquote>\n<p>&hellip; the new algorithm needs no prior knowledge of the accuracies of the weak hypotheses. Rather, it adapts to these accuracies and generates a weighted majority hypothesis in which the weight of each weak hypothesis is a function of its accuracy.<\/p>\n<\/blockquote>\n<p>&mdash; <a href=\"https:\/\/link.springer.com\/chapter\/10.1007\/3-540-59119-2_166\">A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting<\/a>, 1996.<\/p>\n<p>Now that we are familiar with the AdaBoost algorithm, let&rsquo;s look at how we can fit AdaBoost models in Python.<\/p>\n<h2>AdaBoost Scikit-Learn API<\/h2>\n<p>AdaBoost ensembles can be implemented from scratch, although this can be challenging for beginners.<\/p>\n<p>For an example, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/boosting-and-adaboost-for-machine-learning\/\">Boosting and AdaBoost for Machine Learning<\/a><\/li>\n<\/ul>\n<p>The scikit-learn Python machine learning library provides an implementation of AdaBoost ensembles for machine learning.<\/p>\n<p>It is available in a modern version of the library.<\/p>\n<p>First, confirm that you are using a modern version of the library by running the following script:<\/p>\n<pre class=\"crayon-plain-tag\"># check scikit-learn version\r\nimport sklearn\r\nprint(sklearn.__version__)<\/pre>\n<p>Running the script will print your version of scikit-learn.<\/p>\n<p>Your version should be the same or higher. If not, you must upgrade your version of the scikit-learn library.<\/p>\n<pre class=\"crayon-plain-tag\">0.22.1<\/pre>\n<p>AdaBoost is provided via the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.AdaBoostRegressor.html\">AdaBoostRegressor<\/a> and <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.AdaBoostClassifier.html\">AdaBoostClassifier<\/a> classes.<\/p>\n<p>Both models operate the same way and take the same arguments that influence how the decision trees are created.<\/p>\n<p>Randomness is used in the construction of the model. This means that each time the algorithm is run on the same data, it will produce a slightly different model.<\/p>\n<p>When using machine learning algorithms that have a stochastic learning algorithm, it is good practice to evaluate them by averaging their performance across multiple runs or repeats of cross-validation. When fitting a final model it may be desirable to either increase the number of trees until the variance of the model is reduced across repeated evaluations, or to fit multiple final models and average their predictions.<\/p>\n<p>Let&rsquo;s take a look at how to develop an AdaBoost ensemble for both classification and regression.<\/p>\n<h3>AdaBoost for Classification<\/h3>\n<p>In this section, we will look at using AdaBoost for a classification problem.<\/p>\n<p>First, we can use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">make_classification() function<\/a> to create a synthetic binary classification problem with 1,000 examples and 20 input features.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># test classification dataset\r\nfrom sklearn.datasets import make_classification\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=6)\r\n# summarize the dataset\r\nprint(X.shape, y.shape)<\/pre>\n<p>Running the example creates the dataset and summarizes the shape of the input and output components.<\/p>\n<pre class=\"crayon-plain-tag\">(1000, 20) (1000,)<\/pre>\n<p>Next, we can evaluate an AdaBoost algorithm on this dataset.<\/p>\n<p>We will evaluate the model using <a href=\"https:\/\/machinelearningmastery.com\/k-fold-cross-validation\/\">repeated stratified k-fold cross-validation<\/a>, with three repeats and 10 folds. We will report the mean and standard deviation of the accuracy of the model across all repeats and folds.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate adaboost algorithm for classification\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.ensemble import AdaBoostClassifier\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=6)\r\n# define the model\r\nmodel = AdaBoostClassifier()\r\n# evaluate the model\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\nn_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\r\n# report performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))<\/pre>\n<p>Running the example reports the mean and standard deviation accuracy of the model.<\/p>\n<p>Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p>In this case, we can see the AdaBoost ensemble with default hyperparameters achieves a classification accuracy of about 80 percent on this test dataset.<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: 0.806 (0.041)<\/pre>\n<p>We can also use the AdaBoost model as a final model and make predictions for classification.<\/p>\n<p>First, the AdaBoost ensemble is fit on all available data, then the <em>predict()<\/em> function can be called to make predictions on new data.<\/p>\n<p>The example below demonstrates this on our binary classification dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># make predictions using adaboost for classification\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.ensemble import AdaBoostClassifier\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=6)\r\n# define the model\r\nmodel = AdaBoostClassifier()\r\n# fit the model on the whole dataset\r\nmodel.fit(X, y)\r\n# make a single prediction\r\nrow = [[-3.47224758,1.95378146,0.04875169,-0.91592588,-3.54022468,1.96405547,-7.72564954,-2.64787168,-1.81726906,-1.67104974,2.33762043,-4.30273117,0.4839841,-1.28253034,-10.6704077,-0.7641103,-3.58493721,2.07283886,0.08385173,0.91461126]]\r\nyhat = model.predict(row)\r\nprint('Predicted Class: %d' % yhat[0])<\/pre>\n<p>Running the example fits the AdaBoost ensemble model on the entire dataset and is then used to make a prediction on a new row of data, as we might when using the model in an application.<\/p>\n<pre class=\"crayon-plain-tag\">Predicted Class: 0<\/pre>\n<p>Now that we are familiar with using AdaBoost for classification, let&rsquo;s look at the API for regression.<\/p>\n<h3>AdaBoost for Regression<\/h3>\n<p>In this section, we will look at using AdaBoost for a regression problem.<\/p>\n<p>First, we can use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_regression.html\">make_regression() function<\/a> to create a synthetic regression problem with 1,000 examples and 20 input features.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># test regression dataset\r\nfrom sklearn.datasets import make_regression\r\n# define dataset\r\nX, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=6)\r\n# summarize the dataset\r\nprint(X.shape, y.shape)<\/pre>\n<p>Running the example creates the dataset and summarizes the shape of the input and output components.<\/p>\n<pre class=\"crayon-plain-tag\">(1000, 20) (1000,)<\/pre>\n<p>Next, we can evaluate an AdaBoost algorithm on this dataset.<\/p>\n<p>As we did with the last section, we will evaluate the model using repeated k-fold cross-validation, with three repeats and 10 folds. We will report the mean absolute error (MAE) of the model across all repeats and folds. The scikit-learn library makes the MAE negative so that it is maximized instead of minimized. This means that larger negative MAE are better and a perfect model has a MAE of 0.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate adaboost ensemble for regression\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedKFold\r\nfrom sklearn.ensemble import AdaBoostRegressor\r\n# define dataset\r\nX, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=6)\r\n# define the model\r\nmodel = AdaBoostRegressor()\r\n# evaluate the model\r\ncv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)\r\nn_scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1, error_score='raise')\r\n# report performance\r\nprint('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))<\/pre>\n<p>Running the example reports the mean and standard deviation accuracy of the model.<\/p>\n<p>Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p>In this case, we can see the AdaBoost ensemble with default hyperparameters achieves a MAE of about 100.<\/p>\n<pre class=\"crayon-plain-tag\">MAE: -72.327 (4.041)<\/pre>\n<p>We can also use the AdaBoost model as a final model and make predictions for regression.<\/p>\n<p>First, the AdaBoost ensemble is fit on all available data, then the <em>predict()<\/em> function can be called to make predictions on new data.<\/p>\n<p>The example below demonstrates this on our regression dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># adaboost ensemble for making predictions for regression\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.ensemble import AdaBoostRegressor\r\n# define dataset\r\nX, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=6)\r\n# define the model\r\nmodel = AdaBoostRegressor()\r\n# fit the model on the whole dataset\r\nmodel.fit(X, y)\r\n# make a single prediction\r\nrow = [[1.20871625,0.88440466,-0.9030013,-0.22687731,-0.82940077,-1.14410988,1.26554256,-0.2842871,1.43929072,0.74250241,0.34035501,0.45363034,0.1778756,-1.75252881,-1.33337384,-1.50337215,-0.45099008,0.46160133,0.58385557,-1.79936198]]\r\nyhat = model.predict(row)\r\nprint('Prediction: %d' % yhat[0])<\/pre>\n<p>Running the example fits the AdaBoost ensemble model on the entire dataset and is then used to make a prediction on a new row of data, as we might when using the model in an application.<\/p>\n<pre class=\"crayon-plain-tag\">Prediction: -10<\/pre>\n<p>Now that we are familiar with using the scikit-learn API to evaluate and use AdaBoost ensembles, let&rsquo;s look at configuring the model.<\/p>\n<h2>AdaBoost Hyperparameters<\/h2>\n<p>In this section, we will take a closer look at some of the hyperparameters you should consider tuning for the AdaBoost ensemble and their effect on model performance.<\/p>\n<h3>Explore Number of Trees<\/h3>\n<p>An important hyperparameter for AdaBoost algorithm is the number of decision trees used in the ensemble.<\/p>\n<p>Recall that each decision tree used in the ensemble is designed to be a weak learner. That is, it has skill over random prediction, but is not highly skillful. As such, one-level decision trees are used, called decision stumps.<\/p>\n<p>The number of trees added to the model must be high for the model to work well, often hundreds, if not thousands.<\/p>\n<p>The number of trees can be set via the &ldquo;<em>n_estimators<\/em>&rdquo; argument and defaults to 50.<\/p>\n<p>The example below explores the effect of the number of trees with values between 10 to 5,000.<\/p>\n<pre class=\"crayon-plain-tag\"># explore adaboost ensemble number of trees effect on performance\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.ensemble import AdaBoostClassifier\r\nfrom matplotlib import pyplot\r\n\r\n# get the dataset\r\ndef get_dataset():\r\n\tX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=6)\r\n\treturn X, y\r\n\r\n# get a list of models to evaluate\r\ndef get_models():\r\n\tmodels = dict()\r\n\tmodels['10'] = AdaBoostClassifier(n_estimators=10)\r\n\tmodels['50'] = AdaBoostClassifier(n_estimators=50)\r\n\tmodels['100'] = AdaBoostClassifier(n_estimators=100)\r\n\tmodels['500'] = AdaBoostClassifier(n_estimators=500)\r\n\tmodels['1000'] = AdaBoostClassifier(n_estimators=1000)\r\n\tmodels['5000'] = AdaBoostClassifier(n_estimators=5000)\r\n\treturn models\r\n\r\n# evaluate a give model using cross-validation\r\ndef evaluate_model(model):\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\tscores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\r\n\treturn scores\r\n\r\n# define dataset\r\nX, y = get_dataset()\r\n# get the models to evaluate\r\nmodels = get_models()\r\n# evaluate the models and store results\r\nresults, names = list(), list()\r\nfor name, model in models.items():\r\n\tscores = evaluate_model(model)\r\n\tresults.append(scores)\r\n\tnames.append(name)\r\n\tprint('&gt;%s %.3f (%.3f)' % (name, mean(scores), std(scores)))\r\n# plot model performance for comparison\r\npyplot.boxplot(results, labels=names, showmeans=True)\r\npyplot.show()<\/pre>\n<p>Running the example first reports the mean accuracy for each configured number of decision trees.<\/p>\n<p>In this case, we can see that that performance improves on this dataset until about 50 trees and declines after that. This might be a sign of the ensemble overfitting the training dataset after additional trees are added.<\/p>\n<pre class=\"crayon-plain-tag\">&gt;10 0.773 (0.039)\r\n&gt;50 0.806 (0.041)\r\n&gt;100 0.801 (0.032)\r\n&gt;500 0.793 (0.028)\r\n&gt;1000 0.791 (0.032)\r\n&gt;5000 0.782 (0.031)<\/pre>\n<p>A box and whisker plot is created for the distribution of accuracy scores for each configured number of trees.<\/p>\n<p>We can see the general trend of model performance and ensemble size.<\/p>\n<div id=\"attachment_10252\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10252\" class=\"size-full wp-image-10252\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/02\/Box-Plot-of-AdaBoost-Ensemble-Size-vs-Classification-Accuracy.png\" alt=\"Box Plot of AdaBoost Ensemble Size vs. Classification Accuracy\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plot-of-AdaBoost-Ensemble-Size-vs-Classification-Accuracy.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plot-of-AdaBoost-Ensemble-Size-vs-Classification-Accuracy-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plot-of-AdaBoost-Ensemble-Size-vs-Classification-Accuracy-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plot-of-AdaBoost-Ensemble-Size-vs-Classification-Accuracy-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10252\" class=\"wp-caption-text\">Box Plot of AdaBoost Ensemble Size vs. Classification Accuracy<\/p>\n<\/div>\n<h3>Explore Weak Learner<\/h3>\n<p>A decision tree with one level is used as the weak learner by default.<\/p>\n<p>We can make the models used in the ensemble less weak (more skillful) by increasing the depth of the decision tree.<\/p>\n<p>The example below explores the effect of increasing the depth of the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.tree.DecisionTreeClassifier.html\">DecisionTreeClassifier<\/a> weak learner on the AdBoost ensemble.<\/p>\n<pre class=\"crayon-plain-tag\"># explore adaboost ensemble tree depth effect on performance\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.ensemble import AdaBoostClassifier\r\nfrom sklearn.tree import DecisionTreeClassifier\r\nfrom matplotlib import pyplot\r\n\r\n# get the dataset\r\ndef get_dataset():\r\n\tX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=6)\r\n\treturn X, y\r\n\r\n# get a list of models to evaluate\r\ndef get_models():\r\n\tmodels = dict()\r\n\tfor i in range(1,11):\r\n\t\tmodels[str(i)] = AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=i))\r\n\treturn models\r\n\r\n# evaluate a give model using cross-validation\r\ndef evaluate_model(model):\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\tscores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\r\n\treturn scores\r\n\r\n# define dataset\r\nX, y = get_dataset()\r\n# get the models to evaluate\r\nmodels = get_models()\r\n# evaluate the models and store results\r\nresults, names = list(), list()\r\nfor name, model in models.items():\r\n\tscores = evaluate_model(model)\r\n\tresults.append(scores)\r\n\tnames.append(name)\r\n\tprint('&gt;%s %.3f (%.3f)' % (name, mean(scores), std(scores)))\r\n# plot model performance for comparison\r\npyplot.boxplot(results, labels=names, showmeans=True)\r\npyplot.show()<\/pre>\n<p>Running the example first reports the mean accuracy for each configured weak learner tree depth.<\/p>\n<p>In this case, we can see that as the depth of the decision trees is increased, the performance of the ensemble is also increased on this dataset.<\/p>\n<pre class=\"crayon-plain-tag\">&gt;1 0.806 (0.041)\r\n&gt;2 0.864 (0.028)\r\n&gt;3 0.867 (0.030)\r\n&gt;4 0.889 (0.029)\r\n&gt;5 0.909 (0.021)\r\n&gt;6 0.923 (0.020)\r\n&gt;7 0.927 (0.025)\r\n&gt;8 0.928 (0.028)\r\n&gt;9 0.923 (0.017)\r\n&gt;10 0.926 (0.030)<\/pre>\n<p>A box and whisker plot is created for the distribution of accuracy scores for each configured weak learner depth.<\/p>\n<p>We can see the general trend of model performance and weak learner depth.<\/p>\n<div id=\"attachment_10253\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10253\" class=\"size-full wp-image-10253\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/02\/Box-Plot-of-AdaBoost-Ensemble-Weak-Learner-Depth-vs-Classification-Accuracy.png\" alt=\"Box Plot of AdaBoost Ensemble Weak Learner Depth vs. Classification Accuracy\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plot-of-AdaBoost-Ensemble-Weak-Learner-Depth-vs-Classification-Accuracy.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plot-of-AdaBoost-Ensemble-Weak-Learner-Depth-vs-Classification-Accuracy-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plot-of-AdaBoost-Ensemble-Weak-Learner-Depth-vs-Classification-Accuracy-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plot-of-AdaBoost-Ensemble-Weak-Learner-Depth-vs-Classification-Accuracy-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10253\" class=\"wp-caption-text\">Box Plot of AdaBoost Ensemble Weak Learner Depth vs. Classification Accuracy<\/p>\n<\/div>\n<h3>Explore Learning Rate<\/h3>\n<p>AdaBoost also supports a learning rate that controls the contribution of each model to the ensemble prediction.<\/p>\n<p>This is controlled by the &ldquo;<em>learning_rate<\/em>&rdquo; argument and by default is set to 1.0 or full contribution. Smaller or larger values might be appropriate depending on the number of models used in the ensemble. There is a balance between the contribution of the models and the number of trees in the ensemble.<\/p>\n<p>More trees may require a smaller learning rate; fewer trees may require a larger learning rate.<\/p>\n<p>The example below explores learning rate values between 0.1 and 2.0 in 0.1 increments.<\/p>\n<pre class=\"crayon-plain-tag\"># explore adaboost ensemble learning rate effect on performance\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom numpy import arange\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.ensemble import AdaBoostClassifier\r\nfrom matplotlib import pyplot\r\n\r\n# get the dataset\r\ndef get_dataset():\r\n\tX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=6)\r\n\treturn X, y\r\n\r\n# get a list of models to evaluate\r\ndef get_models():\r\n\tmodels = dict()\r\n\tfor i in arange(0.1, 2.1, 0.1):\r\n\t\tkey = '%.3f' % i\r\n\t\tmodels[key] = AdaBoostClassifier(learning_rate=i)\r\n\treturn models\r\n\r\n# evaluate a give model using cross-validation\r\ndef evaluate_model(model):\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\tscores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\r\n\treturn scores\r\n\r\n# define dataset\r\nX, y = get_dataset()\r\n# get the models to evaluate\r\nmodels = get_models()\r\n# evaluate the models and store results\r\nresults, names = list(), list()\r\nfor name, model in models.items():\r\n\tscores = evaluate_model(model)\r\n\tresults.append(scores)\r\n\tnames.append(name)\r\n\tprint('&gt;%s %.3f (%.3f)' % (name, mean(scores), std(scores)))\r\n# plot model performance for comparison\r\npyplot.boxplot(results, labels=names, showmeans=True)\r\npyplot.xticks(rotation=45)\r\npyplot.show()<\/pre>\n<p>Running the example first reports the mean accuracy for each configured learning rate.<\/p>\n<p>In this case, we can see similar values between 0.5 to 1.0 and a decrease in model performance after that.<\/p>\n<pre class=\"crayon-plain-tag\">&gt;0.100 0.767 (0.049)\r\n&gt;0.200 0.786 (0.042)\r\n&gt;0.300 0.802 (0.040)\r\n&gt;0.400 0.798 (0.037)\r\n&gt;0.500 0.805 (0.042)\r\n&gt;0.600 0.795 (0.031)\r\n&gt;0.700 0.799 (0.035)\r\n&gt;0.800 0.801 (0.033)\r\n&gt;0.900 0.805 (0.032)\r\n&gt;1.000 0.806 (0.041)\r\n&gt;1.100 0.801 (0.037)\r\n&gt;1.200 0.800 (0.030)\r\n&gt;1.300 0.799 (0.041)\r\n&gt;1.400 0.793 (0.041)\r\n&gt;1.500 0.790 (0.040)\r\n&gt;1.600 0.775 (0.034)\r\n&gt;1.700 0.767 (0.054)\r\n&gt;1.800 0.768 (0.040)\r\n&gt;1.900 0.736 (0.047)\r\n&gt;2.000 0.682 (0.048)<\/pre>\n<p>A box and whisker plot is created for the distribution of accuracy scores for each configured learning rate.<\/p>\n<p>We can see the general trend of decreasing model performance with a learning rate larger than 1.0 on this dataset.<\/p>\n<div id=\"attachment_10254\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10254\" class=\"size-full wp-image-10254\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/02\/Box-Plot-of-AdaBoost-Ensemble-Learning-Rate-vs-Classification-Accuracy.png\" alt=\"Box Plot of AdaBoost Ensemble Learning Rate vs. Classification Accuracy\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plot-of-AdaBoost-Ensemble-Learning-Rate-vs-Classification-Accuracy.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plot-of-AdaBoost-Ensemble-Learning-Rate-vs-Classification-Accuracy-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plot-of-AdaBoost-Ensemble-Learning-Rate-vs-Classification-Accuracy-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plot-of-AdaBoost-Ensemble-Learning-Rate-vs-Classification-Accuracy-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10254\" class=\"wp-caption-text\">Box Plot of AdaBoost Ensemble Learning Rate vs. Classification Accuracy<\/p>\n<\/div>\n<h3>Explore Alternate Algorithm<\/h3>\n<p>The default algorithm used in the ensemble is a decision tree, although other algorithms can be used.<\/p>\n<p>The intent is to use very simple models, called weak learners. Also, the scikit-learn implementation requires that any models used must also support weighted samples, as they are how the ensemble is created by fitting models based on a weighted version of the training dataset.<\/p>\n<p>The base model can be specified via the &ldquo;<em>base_estimator<\/em>&rdquo; argument. The base model must also support predicting probabilities or probability-like scores in the case of classification. If the specified model does not support a weighted training dataset, you will see an error message as follows:<\/p>\n<pre class=\"crayon-plain-tag\">ValueError: KNeighborsClassifier doesn't support sample_weight.<\/pre>\n<p>One example of a model that supports a weighted training is the logistic regression algorithm.<\/p>\n<p>The example below demonstrates an AdaBoost algorithm with a <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.linear_model.LogisticRegression.html\">LogisticRegression<\/a> weak learner.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate adaboost algorithm with logistic regression weak learner for classification\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.ensemble import AdaBoostClassifier\r\nfrom sklearn.linear_model import LogisticRegression\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=6)\r\n# define the model\r\nmodel = AdaBoostClassifier(base_estimator=LogisticRegression())\r\n# evaluate the model\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\nn_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\r\n# report performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))<\/pre>\n<p>Running the example reports the mean and standard deviation accuracy of the model.<\/p>\n<p>Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p>In this case, we can see the AdaBoost ensemble with a logistic regression weak model achieves a classification accuracy of about 79 percent on this test dataset.<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: 0.794 (0.032)<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Tutorials<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/boosting-and-adaboost-for-machine-learning\/\">Boosting and AdaBoost for Machine Learning<\/a><\/li>\n<\/ul>\n<h3>Papers<\/h3>\n<ul>\n<li><a href=\"https:\/\/link.springer.com\/chapter\/10.1007\/3-540-59119-2_166\">A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting<\/a>, 1996.<\/li>\n<li><a href=\"https:\/\/www.intlpress.com\/site\/pub\/pages\/journals\/items\/sii\/content\/vols\/0002\/0003\/a008\/\">Multi-class AdaBoost<\/a>, 2009.<\/li>\n<li><a href=\"http:\/\/professordrucker.com\/Pubications\/ImprovingRegressorsUsingBoostingTechniques.pdf\">Improving Regressors using Boosting Techniques<\/a>, 1997.<\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.AdaBoostRegressor.html\">sklearn.ensemble.AdaBoostRegressor API<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.AdaBoostClassifier.html\">sklearn.ensemble.AdaBoostClassifier API<\/a>.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Boosting_(machine_learning)\">Boosting (machine learning), Wikipedia<\/a>.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/AdaBoost\">AdaBoost, Wikipedia<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to develop AdaBoost ensembles for classification and regression.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>AdaBoost ensemble is an ensemble created from decision trees added sequentially to the model.<\/li>\n<li>How to use the AdaBoost ensemble for classification and regression with scikit-learn.<\/li>\n<li>How to explore the effect of AdaBoost model hyperparameters on model performance.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/adaboost-ensemble-in-python\/\">How to Develop an AdaBoost Ensemble in Python<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/adaboost-ensemble-in-python\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Boosting is a class of ensemble machine learning algorithms that involve combining the predictions from many weak learners. A weak learner is [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2020\/04\/30\/how-to-develop-an-adaboost-ensemble-in-python\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":3398,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3397"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=3397"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3397\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/3398"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=3397"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=3397"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=3397"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}