{"id":4094,"date":"2020-11-15T18:00:52","date_gmt":"2020-11-15T18:00:52","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2020\/11\/15\/develop-a-bagging-ensemble-with-different-data-transformations\/"},"modified":"2020-11-15T18:00:52","modified_gmt":"2020-11-15T18:00:52","slug":"develop-a-bagging-ensemble-with-different-data-transformations","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2020\/11\/15\/develop-a-bagging-ensemble-with-different-data-transformations\/","title":{"rendered":"Develop a Bagging Ensemble with Different Data Transformations"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Bootstrap aggregation, or bagging, is an ensemble where each model is trained on a different sample of the training dataset.<\/p>\n<p>The idea of bagging can be generalized to other techniques for changing the training dataset and fitting the same model on each changed version of the data. One approach is to use data transforms that change the scale and probability distribution of input variables as the basis for the training of contributing members to a bagging-like ensemble. We can refer to this as data transform bagging or a data transform ensemble.<\/p>\n<p>In this tutorial, you will discover how to develop a data transform ensemble.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Data transforms can be used as the basis for a bagging-type ensemble where the same model is trained on different views of a training dataset.<\/li>\n<li>How to develop a data transform ensemble for classification and confirm the ensemble performs better than any contributing member.<\/li>\n<li>How to develop and evaluate a data transform ensemble for regression predictive modeling.<\/li>\n<\/ul>\n<p>Let&rsquo;s get started.<\/p>\n<div id=\"attachment_11195\" style=\"width: 809px\" class=\"wp-caption aligncenter\"><img decoding=\"async\" aria-describedby=\"caption-attachment-11195\" loading=\"lazy\" class=\"size-full wp-image-11195\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/11\/Develop-a-Bagging-Ensemble-with-Different-Data-Transformations.jpg\" alt=\"Develop a Bagging Ensemble with Different Data Transformations\" width=\"799\" height=\"533\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/11\/Develop-a-Bagging-Ensemble-with-Different-Data-Transformations.jpg 799w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/11\/Develop-a-Bagging-Ensemble-with-Different-Data-Transformations-300x200.jpg 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/11\/Develop-a-Bagging-Ensemble-with-Different-Data-Transformations-768x512.jpg 768w\" sizes=\"(max-width: 799px) 100vw, 799px\"><\/p>\n<p id=\"caption-attachment-11195\" class=\"wp-caption-text\">Develop a Bagging Ensemble with Different Data Transformations<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/138892959@N03\/33278904900\/\">Maciej Kraus<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Data Transform Bagging<\/li>\n<li>Data Transform Ensemble for Classification<\/li>\n<li>Data Transform Ensemble for Regression<\/li>\n<\/ol>\n<h2>Data Transform Bagging<\/h2>\n<p><a href=\"https:\/\/machinelearningmastery.com\/bagging-ensemble-with-python\/\">Bootstrap aggregation<\/a>, or bagging for short, is an ensemble learning technique based on the idea of fitting the same model type on multiple different samples of the same training dataset.<\/p>\n<p>The hope is that small differences in the training dataset used to fit each model will result in small differences in the capabilities of models. For ensemble learning, this is referred to as diversity of ensemble members and is intended to de-correlate the predictions (or prediction errors) made by each contributing member.<\/p>\n<p>Although it was designed to be used with decision trees and each data sample is made using the bootstrap method (selection with rel-selection), the approach has spawned a whole subfield of study with hundreds of variations on the approach.<\/p>\n<p>We can construct our own bagging ensembles by changing the dataset used to train each contributing member in new and unique ways.<\/p>\n<p>One approach would be to apply a different data preparation transform to the dataset for each contributing ensemble member.<\/p>\n<p>This is based on the premise that we cannot know the representational form for a training dataset that exposes the unknown underlying structure to the dataset to the learning algorithms. This motivates the need to evaluate models with a suite of different data transforms, such as changing the scale and probability distribution, in order to discover what works.<\/p>\n<p>This approach can be used where a suite of different transforms of the same training dataset is created, a model trained on each, and the predictions combined using simple statistics such as averaging.<\/p>\n<p>For lack of a better name, we will refer to this as &ldquo;<strong>Data Transform Bagging<\/strong>&rdquo; or a &ldquo;<strong>Data Transform Ensemble<\/strong>.&rdquo;<\/p>\n<p>There are many transforms that we can use, but perhaps a good starting point would be a selection that changes the scale and probability distribution, such as:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/standardscaler-and-minmaxscaler-transforms-in-python\/\">Normalization<\/a> (fixed range)<\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/standardscaler-and-minmaxscaler-transforms-in-python\/\">Standardization<\/a> (zero mean)<\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/robust-scaler-transforms-for-machine-learning\/\">Robust Standardization<\/a> (robust to outliers)<\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/power-transforms-with-scikit-learn\/\">Power Transform<\/a> (remove skew)<\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/quantile-transforms-for-machine-learning\/\">Quantile Transform<\/a> (change distribution)<\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/discretization-transforms-for-machine-learning\/\">Discretization<\/a> (k-bins)<\/li>\n<\/ul>\n<p>The approach is likely to be more effective when used with a base model that trains different or very different models based on the effects of the data transform.<\/p>\n<p>Changing the scale of the distribution may only be appropriate with models that are sensitive to changes in the scale of input variables, such as those that calculate a weighted sum, such as logistic regression and neural networks, and those that use distance measures, such as k-nearest neighbors and support vector machines.<\/p>\n<p>Changes to the probability distribution for input variables would likely impact most machine learning models.<\/p>\n<p>Now that we are familiar with the approach, let&rsquo;s explore how we can develop a data transform ensemble for classification problems.<\/p>\n<h2>Data Transform Ensemble for Classification<\/h2>\n<p>We can develop a data transform approach to bagging for classification using the scikit-learn library.<\/p>\n<p>The library provides a suite of standard transforms that we can use directly. Each ensemble member can be defined as a Pipeline, with the transform followed by the predictive model, in order to avoid any data leakage and, in turn, produce optimistic results. Finally, a voting ensemble can be used to combine the predictions from each pipeline.<\/p>\n<p>First, we can define a synthetic binary classification dataset as the basis for exploring this type of ensemble.<\/p>\n<p>The example below creates a dataset with 1,000 examples each comprising 20 input features, where 15 of them contain information for predicting the target.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># synthetic classification dataset\r\nfrom sklearn.datasets import make_classification\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# summarize the dataset\r\nprint(X.shape, y.shape)<\/pre>\n<p>Running the example will create the dataset and summarizes the shape of the data arrays, confirming our expectations.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">(1000, 20) (1000,)<\/pre>\n<p>Next, we establish a baseline on the problem using the predictive model we intend to use in our ensemble. It is standard practice to use a decision tree in bagging ensembles, so in this case, we will use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.tree.DecisionTreeClassifier.html\">DecisionTreeClassifier<\/a> with default hyperparameters.<\/p>\n<p>We will evaluate the model using standard practices, in this case, repeated stratified k-fold cross-validation with three repeats and 10 folds. The performance will be reported using the mean of the classification accuracy across all folds and repeats.<\/p>\n<p>The complete example of evaluating a decision tree on the synthetic classification dataset is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># evaluate decision tree on synthetic classification dataset\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.tree import DecisionTreeClassifier\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# define the model\r\nmodel = DecisionTreeClassifier()\r\n# define the evaluation procedure\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n# evaluate the model\r\nn_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n# report performance\r\nprint('Mean Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))<\/pre>\n<p>Running the example reports the mean classification accuracy of the decision tree on the synthetic classification dataset.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the model achieved a classification accuracy of about 82.3 percent.<\/p>\n<p>This score provides a baseline in performance from which we expect a data transform ensemble to improve upon.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Mean Accuracy: 0.823 (0.039)<\/pre>\n<p>Next, we can develop an ensemble of decision trees, each fit on a different transform of the input data.<\/p>\n<p>First, we can define each ensemble member as a modeling pipeline. The first step will be the data transform and the second will be a decision tree classifier.<\/p>\n<p>For example, the pipeline for a normalization transform with the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.MinMaxScaler.html\">MinMaxScaler<\/a> class would look as follows:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# normalization\r\nnorm = Pipeline([('s', MinMaxScaler()), ('m', DecisionTreeClassifier())])<\/pre>\n<p>We can repeat this for each transform or transform configuration that we want to use and add all of the model pipelines to a list.<\/p>\n<p>The <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.VotingClassifier.html\">VotingClassifier<\/a> class can be used to combine the predictions from all of the models. This class takes an &ldquo;<em>estimators<\/em>&rdquo; argument that is a list of tuples where each tuple has a name and the model or modeling pipeline. For example:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# normalization\r\nnorm = Pipeline([('s', MinMaxScaler()), ('m', DecisionTreeClassifier())])\r\nmodels.append(('norm', norm))\r\n...\r\n# define the voting ensemble\r\nensemble = VotingClassifier(estimators=models, voting='hard')<\/pre>\n<p>To make the code easier to read, we can define a function <em>get_ensemble()<\/em> to create the members and data transform ensemble itself.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># get a voting ensemble of models\r\ndef get_ensemble():\r\n\t# define the base models\r\n\tmodels = list()\r\n\t# normalization\r\n\tnorm = Pipeline([('s', MinMaxScaler()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('norm', norm))\r\n\t# standardization\r\n\tstd = Pipeline([('s', StandardScaler()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('std', std))\r\n\t# robust\r\n\trobust = Pipeline([('s', RobustScaler()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('robust', robust))\r\n\t# power\r\n\tpower = Pipeline([('s', PowerTransformer()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('power', power))\r\n\t# quantile\r\n\tquant = Pipeline([('s', QuantileTransformer(n_quantiles=100, output_distribution='normal')), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('quant', quant))\r\n\t# kbins\r\n\tkbins = Pipeline([('s', KBinsDiscretizer(n_bins=20, encode='ordinal')), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('kbins', kbins))\r\n\t# define the voting ensemble\r\n\tensemble = VotingClassifier(estimators=models, voting='hard')\r\n\treturn ensemble<\/pre>\n<p>We can then call this function and evaluate the voting ensemble as per normal, just like we did for the decision tree above.<\/p>\n<p>Tying this together, the complete example is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># evaluate data transform bagging ensemble on a classification dataset\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.preprocessing import MinMaxScaler\r\nfrom sklearn.preprocessing import StandardScaler\r\nfrom sklearn.preprocessing import RobustScaler\r\nfrom sklearn.preprocessing import PowerTransformer\r\nfrom sklearn.preprocessing import QuantileTransformer\r\nfrom sklearn.preprocessing import KBinsDiscretizer\r\nfrom sklearn.tree import DecisionTreeClassifier\r\nfrom sklearn.ensemble import VotingClassifier\r\nfrom sklearn.pipeline import Pipeline\r\n\r\n# get a voting ensemble of models\r\ndef get_ensemble():\r\n\t# define the base models\r\n\tmodels = list()\r\n\t# normalization\r\n\tnorm = Pipeline([('s', MinMaxScaler()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('norm', norm))\r\n\t# standardization\r\n\tstd = Pipeline([('s', StandardScaler()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('std', std))\r\n\t# robust\r\n\trobust = Pipeline([('s', RobustScaler()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('robust', robust))\r\n\t# power\r\n\tpower = Pipeline([('s', PowerTransformer()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('power', power))\r\n\t# quantile\r\n\tquant = Pipeline([('s', QuantileTransformer(n_quantiles=100, output_distribution='normal')), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('quant', quant))\r\n\t# kbins\r\n\tkbins = Pipeline([('s', KBinsDiscretizer(n_bins=20, encode='ordinal')), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('kbins', kbins))\r\n\t# define the voting ensemble\r\n\tensemble = VotingClassifier(estimators=models, voting='hard')\r\n\treturn ensemble\r\n\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# get models\r\nensemble = get_ensemble()\r\n# define the evaluation procedure\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n# evaluate the model\r\nn_scores = cross_val_score(ensemble, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n# report performance\r\nprint('Mean Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))<\/pre>\n<p>Running the example reports the mean classification accuracy of the data transform ensemble on the synthetic classification dataset.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the data transform ensemble achieved a classification accuracy of about 83.8 percent, which is a lift over using a decision tree alone that achieved an accuracy of about 82.3 percent.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Mean Accuracy: 0.838 (0.042)<\/pre>\n<p>Although the ensemble performed well compared to a single decision tree, a limitation of this test is that we do not know if the ensemble performed better than any contributing member.<\/p>\n<p>This is important, as if a contributing member to the ensemble performs better, then it would be simpler and easier to use the member itself as the model instead of the ensemble.<\/p>\n<p>We can check this by evaluating the performance of each individual model and comparing the results to the ensemble.<\/p>\n<p>First, we can update the <em>get_ensemble()<\/em> function to return a list of models to evaluate composed of the individual ensemble members as well as the ensemble itself.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># get a voting ensemble of models\r\ndef get_ensemble():\r\n\t# define the base models\r\n\tmodels = list()\r\n\t# normalization\r\n\tnorm = Pipeline([('s', MinMaxScaler()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('norm', norm))\r\n\t# standardization\r\n\tstd = Pipeline([('s', StandardScaler()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('std', std))\r\n\t# robust\r\n\trobust = Pipeline([('s', RobustScaler()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('robust', robust))\r\n\t# power\r\n\tpower = Pipeline([('s', PowerTransformer()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('power', power))\r\n\t# quantile\r\n\tquant = Pipeline([('s', QuantileTransformer(n_quantiles=100, output_distribution='normal')), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('quant', quant))\r\n\t# kbins\r\n\tkbins = Pipeline([('s', KBinsDiscretizer(n_bins=20, encode='ordinal')), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('kbins', kbins))\r\n\t# define the voting ensemble\r\n\tensemble = VotingClassifier(estimators=models, voting='hard')\r\n\t# return a list of tuples each with a name and model\r\n\treturn models + [('ensemble', ensemble)]<\/pre>\n<p>We can call this function and enumerate each model, evaluating it, reporting the performance, and storing the results.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# get models\r\nmodels = get_ensemble()\r\n# evaluate each model\r\nresults = list()\r\nfor name,model in models:\r\n\t# define the evaluation method\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\t# evaluate the model on the dataset\r\n\tn_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n\t# report performance\r\n\tprint('&gt;%s: %.3f (%.3f)' % (name, mean(n_scores), std(n_scores)))\r\n\tresults.append(n_scores)<\/pre>\n<p>Finally, we can plot the distribution of accuracy scores as box and whisker plots side by side and compare the distribution of scores directly.<\/p>\n<p>Visually, we would hope that the spread of scores for the ensemble skews higher than any individual member and that the central tendency of the distribution (mean and median) are also higher than any member.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# plot the results for comparison\r\npyplot.boxplot(results, labels=[n for n,_ in models], showmeans=True)\r\npyplot.show()<\/pre>\n<p>Tying this together, the complete example of comparing the performance of contributing members to the performance of the data transform ensemble is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># comparison of data transform ensemble to each contributing member for classification\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.preprocessing import MinMaxScaler\r\nfrom sklearn.preprocessing import StandardScaler\r\nfrom sklearn.preprocessing import RobustScaler\r\nfrom sklearn.preprocessing import PowerTransformer\r\nfrom sklearn.preprocessing import QuantileTransformer\r\nfrom sklearn.preprocessing import KBinsDiscretizer\r\nfrom sklearn.tree import DecisionTreeClassifier\r\nfrom sklearn.ensemble import VotingClassifier\r\nfrom sklearn.pipeline import Pipeline\r\nfrom matplotlib import pyplot\r\n\r\n# get a voting ensemble of models\r\ndef get_ensemble():\r\n\t# define the base models\r\n\tmodels = list()\r\n\t# normalization\r\n\tnorm = Pipeline([('s', MinMaxScaler()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('norm', norm))\r\n\t# standardization\r\n\tstd = Pipeline([('s', StandardScaler()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('std', std))\r\n\t# robust\r\n\trobust = Pipeline([('s', RobustScaler()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('robust', robust))\r\n\t# power\r\n\tpower = Pipeline([('s', PowerTransformer()), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('power', power))\r\n\t# quantile\r\n\tquant = Pipeline([('s', QuantileTransformer(n_quantiles=100, output_distribution='normal')), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('quant', quant))\r\n\t# kbins\r\n\tkbins = Pipeline([('s', KBinsDiscretizer(n_bins=20, encode='ordinal')), ('m', DecisionTreeClassifier())])\r\n\tmodels.append(('kbins', kbins))\r\n\t# define the voting ensemble\r\n\tensemble = VotingClassifier(estimators=models, voting='hard')\r\n\t# return a list of tuples each with a name and model\r\n\treturn models + [('ensemble', ensemble)]\r\n\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# get models\r\nmodels = get_ensemble()\r\n# evaluate each model\r\nresults = list()\r\nfor name,model in models:\r\n\t# define the evaluation method\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\t# evaluate the model on the dataset\r\n\tn_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n\t# report performance\r\n\tprint('&gt;%s: %.3f (%.3f)' % (name, mean(n_scores), std(n_scores)))\r\n\tresults.append(n_scores)\r\n# plot the results for comparison\r\npyplot.boxplot(results, labels=[n for n,_ in models], showmeans=True)\r\npyplot.show()<\/pre>\n<p>Running the example first reports the mean and standard classification accuracy of each individual model, ending with the performance of the ensemble that combines the models.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that a number of the individual members perform well, such as &ldquo;<em>kbins<\/em>&rdquo; that achieves an accuracy of about 83.3 percent, and &ldquo;<em>std<\/em>&rdquo; that achieves an accuracy of about 83.1 percent. We can also see that the ensemble achieves better overall performance compared to any contributing member, with an accuracy of about 83.4 percent.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">&gt;norm: 0.821 (0.041)\r\n&gt;std: 0.831 (0.045)\r\n&gt;robust: 0.826 (0.044)\r\n&gt;power: 0.825 (0.045)\r\n&gt;quant: 0.817 (0.042)\r\n&gt;kbins: 0.833 (0.035)\r\n&gt;ensemble: 0.834 (0.040)<\/pre>\n<p>A figure is also created showing box and whisker plots of classification accuracy for each individual model as well as the data transform ensemble.<\/p>\n<p>We can see that the distribution for the ensemble is skewed up, which is what we might hope, and that the mean (green triangle) is slightly higher than those of the individual ensemble members.<\/p>\n<div id=\"attachment_11193\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img decoding=\"async\" aria-describedby=\"caption-attachment-11193\" loading=\"lazy\" class=\"size-full wp-image-11193\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/07\/Box-and-Whisker-Plot-of-Accuracy-Distribution-for-Individual-Models-and-Data-Transform-Ensemble.png\" alt=\"Box and Whisker Plot of Accuracy Distribution for Individual Models and Data Transform Ensemble\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/07\/Box-and-Whisker-Plot-of-Accuracy-Distribution-for-Individual-Models-and-Data-Transform-Ensemble.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/07\/Box-and-Whisker-Plot-of-Accuracy-Distribution-for-Individual-Models-and-Data-Transform-Ensemble-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/07\/Box-and-Whisker-Plot-of-Accuracy-Distribution-for-Individual-Models-and-Data-Transform-Ensemble-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/07\/Box-and-Whisker-Plot-of-Accuracy-Distribution-for-Individual-Models-and-Data-Transform-Ensemble-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-11193\" class=\"wp-caption-text\">Box and Whisker Plot of Accuracy Distribution for Individual Models and Data Transform Ensemble<\/p>\n<\/div>\n<p>Now that we are familiar with how to develop a data transform ensemble for classification, let&rsquo;s look at doing the same for regression.<\/p>\n<h2>Data Transform Ensemble for Regression<\/h2>\n<p>In this section, we will explore developing a data transform ensemble for a regression predictive modeling problem.<\/p>\n<p>First, we can define a synthetic binary regression dataset as the basis for exploring this type of ensemble.<\/p>\n<p>The example below creates a dataset with 1,000 examples each of 100 input features where 10 of them contain information for predicting the target.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># synthetic regression dataset\r\nfrom sklearn.datasets import make_regression\r\n# define dataset\r\nX, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)\r\n# summarize the dataset\r\nprint(X.shape, y.shape)<\/pre>\n<p>Running the example creates the dataset and confirms the data has the expected shape.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">(1000, 100) (1000,)<\/pre>\n<p>Next, we can establish a baseline in performance on the synthetic dataset by fitting and evaluating the base model that we intend to use in the ensemble, in this case, a <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.tree.DecisionTreeRegressor.html\">DecisionTreeRegressor<\/a>.<\/p>\n<p>The model will be evaluated using repeated k-fold cross-validation with three repeats and 10 folds. Model performance on the dataset will be reported using the mean absolute error, or MAE. The scikit-learn will invert the score (make it negative) so that the framework can maximize the score. As such, we can ignore the sign on the score.<\/p>\n<p>The example below evaluates the decision tree on the synthetic regression dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># evaluate decision tree on synthetic regression dataset\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedKFold\r\nfrom sklearn.tree import DecisionTreeRegressor\r\n# define dataset\r\nX, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)\r\n# define the model\r\nmodel = DecisionTreeRegressor()\r\n# define the evaluation procedure\r\ncv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n# evaluate the model\r\nn_scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)\r\n# report performance\r\nprint('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))<\/pre>\n<p>Running the example reports the MAE of the decision tree on the synthetic regression dataset.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the model achieved a MAE of about 139.817. This provides a floor in performance that we expect the ensemble model to improve upon.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">MAE: -139.817 (12.449)<\/pre>\n<p>Next, we can develop and evaluate the ensemble.<\/p>\n<p>We will use the same data transforms from the previous section. The <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.VotingRegressor.html\">VotingRegressor<\/a> will be used to combine the predictions, which is appropriate for regression problems.<\/p>\n<p>The get_ensemble() function defined below creates the individual models and the ensemble model and combines all of the models as a list of tuples for evaluation.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># get a voting ensemble of models\r\ndef get_ensemble():\r\n\t# define the base models\r\n\tmodels = list()\r\n\t# normalization\r\n\tnorm = Pipeline([('s', MinMaxScaler()), ('m', DecisionTreeRegressor())])\r\n\tmodels.append(('norm', norm))\r\n\t# standardization\r\n\tstd = Pipeline([('s', StandardScaler()), ('m', DecisionTreeRegressor())])\r\n\tmodels.append(('std', std))\r\n\t# robust\r\n\trobust = Pipeline([('s', RobustScaler()), ('m', DecisionTreeRegressor())])\r\n\tmodels.append(('robust', robust))\r\n\t# power\r\n\tpower = Pipeline([('s', PowerTransformer()), ('m', DecisionTreeRegressor())])\r\n\tmodels.append(('power', power))\r\n\t# quantile\r\n\tquant = Pipeline([('s', QuantileTransformer(n_quantiles=100, output_distribution='normal')), ('m', DecisionTreeRegressor())])\r\n\tmodels.append(('quant', quant))\r\n\t# kbins\r\n\tkbins = Pipeline([('s', KBinsDiscretizer(n_bins=20, encode='ordinal')), ('m', DecisionTreeRegressor())])\r\n\tmodels.append(('kbins', kbins))\r\n\t# define the voting ensemble\r\n\tensemble = VotingRegressor(estimators=models)\r\n\t# return a list of tuples each with a name and model\r\n\treturn models + [('ensemble', ensemble)]<\/pre>\n<p>We can then call this function and evaluate each contributing modeling pipeline independently and compare the results to the ensemble of the pipelines.<\/p>\n<p>Our expectation, as before, is that the ensemble results in a lift in performance over any individual model. If it does not, then the top-performing individual model should be chosen instead.<\/p>\n<p>Tying this together, the complete example for evaluating a data transform ensemble for a regression dataset is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># comparison of data transform ensemble to each contributing member for regression\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedKFold\r\nfrom sklearn.preprocessing import MinMaxScaler\r\nfrom sklearn.preprocessing import StandardScaler\r\nfrom sklearn.preprocessing import RobustScaler\r\nfrom sklearn.preprocessing import PowerTransformer\r\nfrom sklearn.preprocessing import QuantileTransformer\r\nfrom sklearn.preprocessing import KBinsDiscretizer\r\nfrom sklearn.tree import DecisionTreeRegressor\r\nfrom sklearn.ensemble import VotingRegressor\r\nfrom sklearn.pipeline import Pipeline\r\nfrom matplotlib import pyplot\r\n\r\n# get a voting ensemble of models\r\ndef get_ensemble():\r\n\t# define the base models\r\n\tmodels = list()\r\n\t# normalization\r\n\tnorm = Pipeline([('s', MinMaxScaler()), ('m', DecisionTreeRegressor())])\r\n\tmodels.append(('norm', norm))\r\n\t# standardization\r\n\tstd = Pipeline([('s', StandardScaler()), ('m', DecisionTreeRegressor())])\r\n\tmodels.append(('std', std))\r\n\t# robust\r\n\trobust = Pipeline([('s', RobustScaler()), ('m', DecisionTreeRegressor())])\r\n\tmodels.append(('robust', robust))\r\n\t# power\r\n\tpower = Pipeline([('s', PowerTransformer()), ('m', DecisionTreeRegressor())])\r\n\tmodels.append(('power', power))\r\n\t# quantile\r\n\tquant = Pipeline([('s', QuantileTransformer(n_quantiles=100, output_distribution='normal')), ('m', DecisionTreeRegressor())])\r\n\tmodels.append(('quant', quant))\r\n\t# kbins\r\n\tkbins = Pipeline([('s', KBinsDiscretizer(n_bins=20, encode='ordinal')), ('m', DecisionTreeRegressor())])\r\n\tmodels.append(('kbins', kbins))\r\n\t# define the voting ensemble\r\n\tensemble = VotingRegressor(estimators=models)\r\n\t# return a list of tuples each with a name and model\r\n\treturn models + [('ensemble', ensemble)]\r\n\r\n# generate regression dataset\r\nX, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)\r\n# get models\r\nmodels = get_ensemble()\r\n# evaluate each model\r\nresults = list()\r\nfor name,model in models:\r\n\t# define the evaluation method\r\n\tcv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\t# evaluate the model on the dataset\r\n\tn_scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)\r\n\t# report performance\r\n\tprint('&gt;%s: %.3f (%.3f)' % (name, mean(n_scores), std(n_scores)))\r\n\tresults.append(n_scores)\r\n# plot the results for comparison\r\npyplot.boxplot(results, labels=[n for n,_ in models], showmeans=True)\r\npyplot.show()<\/pre>\n<p>Running the example first reports the MAE of each individual model, ending with the performance of the ensemble that combines the models.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>We can see that each model performs about the same, with MAE error scores around 140, all higher than the decision tree used in isolation. Interestingly, the ensemble performs the best, out-performing all of the individual members and the tree with no transforms, achieving a MAE of about 126.487.<\/p>\n<p>This result suggests that although each pipeline performs worse than a single tree without transforms, each pipeline is making different errors and that the average of the models is able to leverage and harness these differences toward lower error.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">&gt;norm: -140.559 (11.783)\r\n&gt;std: -140.582 (11.996)\r\n&gt;robust: -140.813 (11.827)\r\n&gt;power: -141.089 (12.668)\r\n&gt;quant: -141.109 (11.097)\r\n&gt;kbins: -145.134 (11.638)\r\n&gt;ensemble: -126.487 (9.999)<\/pre>\n<p>A figure is created comparing the distribution of MAE scores for each pipeline and the ensemble.<\/p>\n<p>As we hoped, the distribution for the ensemble skews higher compared to all of the other models and has a higher (smaller) central tendency (mean and median indicated by the green triangle and orange line respectively).<\/p>\n<div id=\"attachment_11194\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img decoding=\"async\" aria-describedby=\"caption-attachment-11194\" loading=\"lazy\" class=\"size-full wp-image-11194\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/07\/Box-and-Whisker-Plot-of-MAE-Distributions-for-Individual-Models-and-Data-Transform-Ensemble.png\" alt=\"Box and Whisker Plot of MAE Distributions for Individual Models and Data Transform Ensemble\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/07\/Box-and-Whisker-Plot-of-MAE-Distributions-for-Individual-Models-and-Data-Transform-Ensemble.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/07\/Box-and-Whisker-Plot-of-MAE-Distributions-for-Individual-Models-and-Data-Transform-Ensemble-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/07\/Box-and-Whisker-Plot-of-MAE-Distributions-for-Individual-Models-and-Data-Transform-Ensemble-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/07\/Box-and-Whisker-Plot-of-MAE-Distributions-for-Individual-Models-and-Data-Transform-Ensemble-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-11194\" class=\"wp-caption-text\">Box and Whisker Plot of MAE Distributions for Individual Models and Data Transform Ensemble<\/p>\n<\/div>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Tutorials<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/voting-ensembles-with-python\/\">How to Develop Voting Ensembles With Python<\/a><\/li>\n<\/ul>\n<h3>Books<\/h3>\n<ul>\n<li><a href=\"https:\/\/amzn.to\/2zxc0F7\">Pattern Classification Using Ensemble Methods<\/a>, 2010.<\/li>\n<li><a href=\"https:\/\/amzn.to\/2XZzrjG\">Ensemble Methods<\/a>, 2012.<\/li>\n<li><a href=\"https:\/\/amzn.to\/2C7syo5\">Ensemble Machine Learning<\/a>, 2012.<\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.VotingClassifier.html\">sklearn.ensemble.VotingClassifier API<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.VotingRegressor.html\">sklearn.ensemble.VotingRegressor API<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to develop a data transform ensemble.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Data transforms can be used as the basis for a bagging-type ensemble where the same model is trained on different views of a training dataset.<\/li>\n<li>How to develop a data transform ensemble for classification and confirm the ensemble performs better than any contributing member.<\/li>\n<li>How to develop and evaluate a data transform ensemble for regression predictive modeling.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/bagging-ensemble-with-different-data-transformations\/\">Develop a Bagging Ensemble with Different Data Transformations<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/bagging-ensemble-with-different-data-transformations\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Bootstrap aggregation, or bagging, is an ensemble where each model is trained on a different sample of the training dataset. The idea [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2020\/11\/15\/develop-a-bagging-ensemble-with-different-data-transformations\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":4095,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4094"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=4094"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4094\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/4095"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=4094"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=4094"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=4094"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}