{"id":4027,"date":"2020-10-29T18:00:44","date_gmt":"2020-10-29T18:00:44","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2020\/10\/29\/how-to-develop-a-random-subspace-ensemble-with-python\/"},"modified":"2020-10-29T18:00:44","modified_gmt":"2020-10-29T18:00:44","slug":"how-to-develop-a-random-subspace-ensemble-with-python","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2020\/10\/29\/how-to-develop-a-random-subspace-ensemble-with-python\/","title":{"rendered":"How to Develop a Random Subspace Ensemble With Python"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p><strong>Random Subspace Ensemble<\/strong> is a machine learning algorithm that combines the predictions from multiple decision trees trained on different subsets of columns in the training dataset.<\/p>\n<p>Randomly varying the columns used to train each contributing member of the ensemble has the effect of introducing diversity into the ensemble and, in turn, can lift performance over using a single decision tree.<\/p>\n<p>It is related to other ensembles of decision trees such as bootstrap aggregation (bagging) that creates trees using different samples of rows from the training dataset, and random forest that combines ideas from bagging and the random subspace ensemble.<\/p>\n<p>Although decision trees are often used, the general random subspace method can be used with any machine learning model whose performance varies meaningfully with the choice of input features.<\/p>\n<p>In this tutorial, you will discover how to develop random subspace ensembles for classification and regression.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Random subspace ensembles are created from decision trees fit on different samples of features (columns) in the training dataset.<\/li>\n<li>How to use the random subspace ensemble for classification and regression with scikit-learn.<\/li>\n<li>How to explore the effect of random subspace model hyperparameters on model performance.<\/li>\n<\/ul>\n<p>Let&rsquo;s get started.<\/p>\n<div id=\"attachment_11151\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img decoding=\"async\" aria-describedby=\"caption-attachment-11151\" loading=\"lazy\" class=\"size-full wp-image-11151\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/11\/How-to-Develop-a-Random-Subspace-Ensemble-With-Python.jpg\" alt=\"How to Develop a Random Subspace Ensemble With Python\" width=\"800\" height=\"537\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/11\/How-to-Develop-a-Random-Subspace-Ensemble-With-Python.jpg 800w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/11\/How-to-Develop-a-Random-Subspace-Ensemble-With-Python-300x201.jpg 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/11\/How-to-Develop-a-Random-Subspace-Ensemble-With-Python-768x516.jpg 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"><\/p>\n<p id=\"caption-attachment-11151\" class=\"wp-caption-text\">How to Develop a Random Subspace Ensemble With Python<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/mars_\/18002370619\/\">Marsel Minga<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Random Subspace Ensemble<\/li>\n<li>Random Subspace Ensemble via Bagging\n<ol>\n<li>Random Subspace Ensemble for Classification<\/li>\n<li>Random Subspace Ensemble for Regression<\/li>\n<\/ol>\n<\/li>\n<li>Random Subspace Ensemble Hyperparameters\n<ol>\n<li>Explore Number of Trees<\/li>\n<li>Explore Number of Features<\/li>\n<li>Explore Alternate Algorithm<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<h2>Random Subspace Ensemble<\/h2>\n<p>A predictive modeling problem consists of one or more input variables and a target variable.<\/p>\n<p>A variable is a column in the data and is also often referred to as a feature. We can consider all input features together as defining an n-dimensional vector space, where n is the number of input features and each example (input row of data) is a point in the feature space.<\/p>\n<p>This is a common conceptualization in machine learning and as input feature spaces become larger, the distance between points in the space increases, known generally as the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Curse_of_dimensionality\">curse of dimensionality<\/a>.<\/p>\n<p>A subset of input features can, therefore, be thought of as a subset of the input feature space, or a subspace.<\/p>\n<p>Selecting features is a way of defining a subspace of the input feature space. For example, <a href=\"https:\/\/machinelearningmastery.com\/feature-selection-with-real-and-categorical-data\/\">feature selection<\/a> refers to an attempt to reduce the number of dimensions of the input feature space by selecting a subset of features to keep or a subset of features to delete, often based on their relationship to the target variable.<\/p>\n<p>Alternatively, we can select random subsets of input features to define random subspaces. This can be used as the basis for an ensemble learning algorithm, where a model can be fit on each random subspace of features. This is referred to as a random subspace ensemble or the <strong>random subspace method<\/strong>.<\/p>\n<blockquote>\n<p>The training data is usually described by a set of features. Different subsets of features, or called subspaces, provide different views on the data. Therefore, individual learners trained from different subspaces are usually diverse.<\/p>\n<\/blockquote>\n<p>&mdash; Page 116, <a href=\"https:\/\/amzn.to\/2XZzrjG\">Ensemble Methods<\/a>, 2012.<\/p>\n<p>It was proposed by Tin Kam Ho in the 1998 paper titled &ldquo;<a href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/709601\/\">The Random Subspace Method For Constructing Decision Forests<\/a>&rdquo; where a decision tree is fit on each random subspace.<\/p>\n<p>More generally, it is a diversity technique for ensemble learning that belongs to a class of methods that change the training dataset for each model in the attempt to reduce the correlation between the predictions of the models in the ensemble.<\/p>\n<p>The procedure is as simple as selecting a random subset of input features (columns) for each model in the ensemble and fitting the model on the model in the entire training dataset. It can be augmented with additional changes, such as using a bootstrap or random sample of the rows in training dataset.<\/p>\n<blockquote>\n<p>The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces.<\/p>\n<\/blockquote>\n<p>&mdash; <a href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/709601\/\">The Random Subspace Method For Constructing Decision Forests<\/a>, 1998.<\/p>\n<p>As such, the random subspace ensemble is related to bootstrap aggregation (bagging) that introduces diversity by training each model, often a decision tree, on a different random sample of the training dataset, with replacement (e.g. the bootstrap sampling method). The random forest ensemble may also be considered a hybrid of both the bagging and random subset ensemble methods.<\/p>\n<blockquote>\n<p>Algorithms that use different feature subsets are commonly referred to as random subspace methods &hellip;<\/p>\n<\/blockquote>\n<p>&mdash; Page 21, <a href=\"https:\/\/amzn.to\/2C7syo5\">Ensemble Machine Learning<\/a>, 2012.<\/p>\n<p>The random subspace method can be used with any machine learning algorithm, although it is well suited to models that are sensitive to large changes to the input features, such as decision trees and k-nearest neighbors.<\/p>\n<p>It is appropriate for datasets that have a large number of input features, as it can result in good performance with good efficiency. If the dataset contains many irrelevant input features, it may be better to use feature selection as a data preparation technique as the prevalence of irrelevant features in subspaces may hurt the performance of the ensemble.<\/p>\n<blockquote>\n<p>For data with a lot of redundant features, training a learner in a subspace will be not only effective but also efficient.<\/p>\n<\/blockquote>\n<p>&mdash; Page 116, <a href=\"https:\/\/amzn.to\/2XZzrjG\">Ensemble Methods<\/a>, 2012.<\/p>\n<p>Now that we are familiar with the random subspace ensemble, let&rsquo;s explore how we can implement the approach.<\/p>\n<h2>Random Subspace Ensemble via Bagging<\/h2>\n<p>We can implement the random subspace ensemble using bagging in scikit-learn.<\/p>\n<p>Bagging is provided via the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.BaggingRegressor.html\">BaggingRegressor<\/a> and <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.BaggingClassifier.html\">BaggingClassifier<\/a> classes.<\/p>\n<p>We can configure bagging to be a random subspace ensemble by setting the &ldquo;<em>bootstrap<\/em>&rdquo; argument to &ldquo;<em>False<\/em>&rdquo; to turn off sampling of the training dataset rows and setting the maximum number of features to a given value via the &ldquo;<em>max_features<\/em>&rdquo; argument.<\/p>\n<p>The default model for bagging is a decision tree, but it can be changed to any model we like.<\/p>\n<p>We can demonstrate using bagging to implement a random subspace ensemble with decision trees for classification and regression.<\/p>\n<h3>Random Subspace Ensemble for Classification<\/h3>\n<p>In this section, we will look at developing a random subspace ensemble using bagging for a classification problem.<\/p>\n<p>First, we can use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">make_classification() function<\/a> to create a synthetic binary classification problem with 1,000 examples and 20 input features.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># test classification dataset\r\nfrom sklearn.datasets import make_classification\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=5)\r\n# summarize the dataset\r\nprint(X.shape, y.shape)<\/pre>\n<p>Running the example creates the dataset and summarizes the shape of the input and output components.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">(1000, 20) (1000,)<\/pre>\n<p>Next, we can configure a bagging model to be a random subspace ensemble for decision trees on this dataset.<\/p>\n<p>Each model will be fit on a random subspace of 10 input features, chosen arbitrarily.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define the random subspace ensemble model\r\nmodel = BaggingClassifier(bootstrap=False, max_features=10)<\/pre>\n<p>We will evaluate the model using repeated stratified k-fold cross-validation, with three repeats and 10 folds. We will report the mean and standard deviation of the accuracy of the model across all repeats and folds.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># evaluate random subspace ensemble via bagging for classification\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.ensemble import BaggingClassifier\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=5)\r\n# define the random subspace ensemble model\r\nmodel = BaggingClassifier(bootstrap=False, max_features=10)\r\n# define the evaluation method\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n# evaluate the model on the dataset\r\nn_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n# report performance\r\nprint('Mean Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))<\/pre>\n<p>Running the example reports the mean and standard deviation accuracy of the model.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see the random subspace ensemble with default hyperparameters achieves a classification accuracy of about 85.4 percent on this test dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Mean Accuracy: 0.854 (0.039)<\/pre>\n<p>We can also use the random subspace ensemble model as a final model and make predictions for classification.<\/p>\n<p>First, the ensemble is fit on all available data, then the <em>predict()<\/em> function can be called to make predictions on new data.<\/p>\n<p>The example below demonstrates this on our binary classification dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># make predictions using random subspace ensemble via bagging for classification\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.ensemble import BaggingClassifier\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=5)\r\n# define the model\r\nmodel = BaggingClassifier(bootstrap=False, max_features=10)\r\n# fit the model on the whole dataset\r\nmodel.fit(X, y)\r\n# make a single prediction\r\nrow = [[-4.7705504,-1.88685058,-0.96057964,2.53850317,-6.5843005,3.45711663,-7.46225013,2.01338213,-0.45086384,-1.89314931,-2.90675203,-0.21214568,-0.9623956,3.93862591,0.06276375,0.33964269,4.0835676,1.31423977,-2.17983117,3.1047287]]\r\nyhat = model.predict(row)\r\nprint('Predicted Class: %d' % yhat[0])<\/pre>\n<p>Running the example fits the random subspace ensemble model on the entire dataset and is then used to make a prediction on a new row of data, as we might when using the model in an application.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Predicted Class: 1<\/pre>\n<p>Now that we are familiar with using bagging for classification, let&rsquo;s look at the API for regression.<\/p>\n<h3>Random Subspace Ensemble for Regression<\/h3>\n<p>In this section, we will look at using bagging for a regression problem.<\/p>\n<p>First, we can use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_regression.html\">make_regression() function<\/a> to create a synthetic regression problem with 1,000 examples and 20 input features.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># test regression dataset\r\nfrom sklearn.datasets import make_regression\r\n# define dataset\r\nX, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=5)\r\n# summarize the dataset\r\nprint(X.shape, y.shape)<\/pre>\n<p>Running the example creates the dataset and summarizes the shape of the input and output components.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">(1000, 20) (1000,)<\/pre>\n<p>Next, we can evaluate a random subspace ensemble via bagging on this dataset.<\/p>\n<p>As before, we must configure bagging to use all rows of the training dataset and specify the number of input features to randomly select.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define the model\r\nmodel = BaggingRegressor(bootstrap=False, max_features=10)<\/pre>\n<p>As we did with the last section, we will evaluate the model using repeated k-fold cross-validation, with three repeats and 10 folds. We will report the mean absolute error (MAE) of the model across all repeats and folds. The scikit-learn library makes the MAE negative so that it is maximized instead of minimized. This means that larger negative MAE are better and a perfect model has a MAE of 0.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># evaluate random subspace ensemble via bagging for regression\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedKFold\r\nfrom sklearn.ensemble import BaggingRegressor\r\n# define dataset\r\nX, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=5)\r\n# define the model\r\nmodel = BaggingRegressor(bootstrap=False, max_features=10)\r\n# define the evaluation procedure\r\ncv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n# evaluate the model\r\nn_scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1, error_score='raise')\r\n# report performance\r\nprint('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))<\/pre>\n<p>Running the example reports the mean and standard deviation accuracy of the model.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the bagging ensemble with default hyperparameters achieves a MAE of about 114.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">MAE: -114.630 (10.920)<\/pre>\n<p>We can also use the random subspace ensemble model as a final model and make predictions for regression.<\/p>\n<p>First, the ensemble is fit on all available data, then the predict() function can be called to make predictions on new data.<\/p>\n<p>The example below demonstrates this on our regression dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># random subspace ensemble via bagging for making predictions for regression\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.ensemble import BaggingRegressor\r\n# define dataset\r\nX, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=5)\r\n# define the model\r\nmodel = BaggingRegressor(bootstrap=False, max_features=10)\r\n# fit the model on the whole dataset\r\nmodel.fit(X, y)\r\n# make a single prediction\r\nrow = [[0.88950817,-0.93540416,0.08392824,0.26438806,-0.52828711,-1.21102238,-0.4499934,1.47392391,-0.19737726,-0.22252503,0.02307668,0.26953276,0.03572757,-0.51606983,-0.39937452,1.8121736,-0.00775917,-0.02514283,-0.76089365,1.58692212]]\r\nyhat = model.predict(row)\r\nprint('Prediction: %d' % yhat[0])<\/pre>\n<p>Running the example fits the random subspace ensemble model on the entire dataset and is then used to make a prediction on a new row of data, as we might when using the model in an application.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Prediction: -157<\/pre>\n<p>Now that we are familiar with using the scikit-learn API to evaluate and use random subspace ensembles, let&rsquo;s look at configuring the model.<\/p>\n<h2>Random Subspace Ensemble Hyperparameters<\/h2>\n<p>In this section, we will take a closer look at some of the hyperparameters you should consider tuning for the random subspace ensemble and their effect on model performance.<\/p>\n<h3>Explore Number of Trees<\/h3>\n<p>An important hyperparameter for the random subspace method is the number of decision trees used in the ensemble. More trees will stabilize the variance of the model, countering the effect of the number of features selected by each tree that introduces diversity.<\/p>\n<p>The number of trees can be set via the &ldquo;<em>n_estimators<\/em>&rdquo; argument and defaults to 10.<\/p>\n<p>The example below explores the effect of the number of trees with values between 10 to 5,000.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># explore random subspace ensemble number of trees effect on performance\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.ensemble import BaggingClassifier\r\nfrom matplotlib import pyplot\r\n\r\n# get the dataset\r\ndef get_dataset():\r\n\tX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=5)\r\n\treturn X, y\r\n\r\n# get a list of models to evaluate\r\ndef get_models():\r\n\tmodels = dict()\r\n\tn_trees = [10, 50, 100, 500, 1000, 5000]\r\n\tfor n in n_trees:\r\n\t\tmodels[str(n)] = BaggingClassifier(n_estimators=n, bootstrap=False, max_features=10)\r\n\treturn models\r\n\r\n# evaluate a given model using cross-validation\r\ndef evaluate_model(model):\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\tscores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n\treturn scores\r\n\r\n# define dataset\r\nX, y = get_dataset()\r\n# get the models to evaluate\r\nmodels = get_models()\r\n# evaluate the models and store results\r\nresults, names = list(), list()\r\nfor name, model in models.items():\r\n\tscores = evaluate_model(model)\r\n\tresults.append(scores)\r\n\tnames.append(name)\r\n\tprint('&gt;%s %.3f (%.3f)' % (name, mean(scores), std(scores)))\r\n# plot model performance for comparison\r\npyplot.boxplot(results, labels=names, showmeans=True)\r\npyplot.show()<\/pre>\n<p>Running the example first reports the mean accuracy for each configured number of decision trees.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that that performance appears to continue to improve as the number of ensemble members is increased to 5,000.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">&gt;10 0.853 (0.030)\r\n&gt;50 0.885 (0.038)\r\n&gt;100 0.891 (0.034)\r\n&gt;500 0.894 (0.036)\r\n&gt;1000 0.894 (0.034)\r\n&gt;5000 0.896 (0.033)<\/pre>\n<p>A box and whisker plot is created for the distribution of accuracy scores for each configured number of trees.<\/p>\n<p>We can see the general trend of further improvement with the number of decision trees used in the ensemble.<\/p>\n<div id=\"attachment_11149\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img decoding=\"async\" aria-describedby=\"caption-attachment-11149\" loading=\"lazy\" class=\"size-full wp-image-11149\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/11\/Box-Plot-of-Random-Subspace-Ensemble-Size-vs.-Classification-Accuracy.png\" alt=\"Box Plot of Random Subspace Ensemble Size vs. Classification Accuracy\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/11\/Box-Plot-of-Random-Subspace-Ensemble-Size-vs.-Classification-Accuracy.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/11\/Box-Plot-of-Random-Subspace-Ensemble-Size-vs.-Classification-Accuracy-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/11\/Box-Plot-of-Random-Subspace-Ensemble-Size-vs.-Classification-Accuracy-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/11\/Box-Plot-of-Random-Subspace-Ensemble-Size-vs.-Classification-Accuracy-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-11149\" class=\"wp-caption-text\">Box Plot of Random Subspace Ensemble Size vs. Classification Accuracy<\/p>\n<\/div>\n<h3>Explore Number of Features<\/h3>\n<p>The number of features selected for each random subspace controls the diversity of the ensemble.<\/p>\n<p>Fewer features mean more diversity, whereas more features mean less diversity. More diversity may require more trees to reduce the variance of predictions made by the model.<\/p>\n<p>We can vary the diversity of the ensemble by varying the number of random features selected by setting the &ldquo;<em>max_features<\/em>&rdquo; argument.<\/p>\n<p>The example below varies the value from 1 to 20 with a fixed number of trees in the ensemble.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># explore random subspace ensemble number of features effect on performance\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.ensemble import BaggingClassifier\r\nfrom matplotlib import pyplot\r\n\r\n# get the dataset\r\ndef get_dataset():\r\n\tX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=5)\r\n\treturn X, y\r\n\r\n# get a list of models to evaluate\r\ndef get_models():\r\n\tmodels = dict()\r\n\tfor n in range(1,21):\r\n\t\tmodels[str(n)] = BaggingClassifier(n_estimators=100, bootstrap=False, max_features=n)\r\n\treturn models\r\n\r\n# evaluate a given model using cross-validation\r\ndef evaluate_model(model):\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\tscores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n\treturn scores\r\n\r\n# define dataset\r\nX, y = get_dataset()\r\n# get the models to evaluate\r\nmodels = get_models()\r\n# evaluate the models and store results\r\nresults, names = list(), list()\r\nfor name, model in models.items():\r\n\tscores = evaluate_model(model)\r\n\tresults.append(scores)\r\n\tnames.append(name)\r\n\tprint('&gt;%s %.3f (%.3f)' % (name, mean(scores), std(scores)))\r\n# plot model performance for comparison\r\npyplot.boxplot(results, labels=names, showmeans=True)\r\npyplot.show()<\/pre>\n<p>Running the example first reports the mean accuracy for each number of features.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that perhaps using 8 to 11 features in the random subspaces might be appropriate on this dataset when using 100 decision trees. This might suggest increasing the number of trees to a large value first, then tuning the number of features selected in each subset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">&gt;1 0.607 (0.036)\r\n&gt;2 0.771 (0.042)\r\n&gt;3 0.837 (0.036)\r\n&gt;4 0.858 (0.037)\r\n&gt;5 0.869 (0.034)\r\n&gt;6 0.883 (0.033)\r\n&gt;7 0.887 (0.038)\r\n&gt;8 0.894 (0.035)\r\n&gt;9 0.893 (0.035)\r\n&gt;10 0.885 (0.038)\r\n&gt;11 0.892 (0.034)\r\n&gt;12 0.883 (0.036)\r\n&gt;13 0.881 (0.044)\r\n&gt;14 0.875 (0.038)\r\n&gt;15 0.869 (0.041)\r\n&gt;16 0.861 (0.044)\r\n&gt;17 0.851 (0.041)\r\n&gt;18 0.831 (0.046)\r\n&gt;19 0.815 (0.046)\r\n&gt;20 0.801 (0.049)<\/pre>\n<p>A box and whisker plot is created for the distribution of accuracy scores for each number of random subset features.<\/p>\n<p>We can see a general trend of increasing accuracy to a point and a steady decrease in performance after 11 features.<\/p>\n<div id=\"attachment_11150\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img decoding=\"async\" aria-describedby=\"caption-attachment-11150\" loading=\"lazy\" class=\"size-full wp-image-11150\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/11\/Box-Plot-of-Random-Subspace-Ensemble-Features-vs.-Classification-Accuracy.png\" alt=\"Box Plot of Random Subspace Ensemble Features vs. Classification Accuracy\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/11\/Box-Plot-of-Random-Subspace-Ensemble-Features-vs.-Classification-Accuracy.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/11\/Box-Plot-of-Random-Subspace-Ensemble-Features-vs.-Classification-Accuracy-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/11\/Box-Plot-of-Random-Subspace-Ensemble-Features-vs.-Classification-Accuracy-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/11\/Box-Plot-of-Random-Subspace-Ensemble-Features-vs.-Classification-Accuracy-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-11150\" class=\"wp-caption-text\">Box Plot of Random Subspace Ensemble Features vs. Classification Accuracy<\/p>\n<\/div>\n<h3>Explore Alternate Algorithm<\/h3>\n<p>Decision trees are the most common algorithm used in a random subspace ensemble.<\/p>\n<p>The reason for this is that they are easy to configure and work well on most problems.<\/p>\n<p>Other algorithms can be used to construct random subspaces and must be configured to have a modestly high variance. One example is the k-nearest neighbors algorithm where the <em>k<\/em> value can be set to a low value.<\/p>\n<p>The algorithm used in the ensemble is specified via the &ldquo;<em>base_estimator<\/em>&rdquo; argument and must be set to an instance of the algorithm and algorithm configuration to use.<\/p>\n<p>The example below demonstrates using a <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.neighbors.KNeighborsClassifier.html\">KNeighborsClassifier<\/a> as the base algorithm used in the random subspace ensemble via the bagging class. Here, the algorithm is used with default hyperparameters where k is set to 5.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define the model\r\nmodel = BaggingClassifier(base_estimator=KNeighborsClassifier(), bootstrap=False, max_features=10)<\/pre>\n<p>The complete example is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># evaluate random subspace ensemble with knn algorithm for classification\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.neighbors import KNeighborsClassifier\r\nfrom sklearn.ensemble import BaggingClassifier\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=5)\r\n# define the model\r\nmodel = BaggingClassifier(base_estimator=KNeighborsClassifier(), bootstrap=False, max_features=10)\r\n# define the evaluation procedure\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n# evaluate the model\r\nn_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\r\n# report performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))<\/pre>\n<p>Running the example reports the mean and standard deviation accuracy of the model.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see the random subspace ensemble with KNN and default hyperparameters achieves a classification accuracy of about 90 percent on this test dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Accuracy: 0.901 (0.032)<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Papers<\/h3>\n<ul>\n<li><a href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/709601\/\">The Random Subspace Method For Constructing Decision Forests<\/a>, 1998.<\/li>\n<\/ul>\n<h3>Books<\/h3>\n<ul>\n<li><a href=\"https:\/\/amzn.to\/2zxc0F7\">Pattern Classification Using Ensemble Methods<\/a>, 2010.<\/li>\n<li><a href=\"https:\/\/amzn.to\/2XZzrjG\">Ensemble Methods<\/a>, 2012.<\/li>\n<li><a href=\"https:\/\/amzn.to\/2C7syo5\">Ensemble Machine Learning<\/a>, 2012.<\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.BaggingClassifier.html\">sklearn.ensemble.BaggingClassifier API<\/a>.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Random_subspace_method\">Random subspace method, Wikipedia<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to develop random subspace ensembles for classification and regression.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Random subspace ensembles are created from decision trees fit on different samples of features (columns) in the training dataset.<\/li>\n<li>How to use the random subspace ensemble for classification and regression with scikit-learn.<\/li>\n<li>How to explore the effect of random subspace model hyperparameters on model performance.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/random-subspace-ensemble-with-python\/\">How to Develop a Random Subspace Ensemble With Python<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/random-subspace-ensemble-with-python\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Random Subspace Ensemble is a machine learning algorithm that combines the predictions from multiple decision trees trained on different subsets of columns [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2020\/10\/29\/how-to-develop-a-random-subspace-ensemble-with-python\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":4028,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4027"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=4027"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4027\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/4028"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=4027"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=4027"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=4027"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}