{"id":2917,"date":"2019-12-12T18:00:48","date_gmt":"2019-12-12T18:00:48","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/12\/12\/tune-hyperparameters-for-classification-machine-learning-algorithms\/"},"modified":"2019-12-12T18:00:48","modified_gmt":"2019-12-12T18:00:48","slug":"tune-hyperparameters-for-classification-machine-learning-algorithms","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/12\/12\/tune-hyperparameters-for-classification-machine-learning-algorithms\/","title":{"rendered":"Tune Hyperparameters for Classification Machine Learning Algorithms"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Machine learning algorithms have hyperparameters that allow you to tailor the behavior of the algorithm to your specific dataset.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/difference-between-a-parameter-and-a-hyperparameter\/\">Hyperparameters<\/a> are different from parameters, which are the internal coefficients or weights for a model found by the learning algorithm. Unlike parameters, hyperparameters are specified by the practitioner when configuring the model.<\/p>\n<p>Typically, it is challenging to know what values to use for the hyperparameters of a given algorithm on a given dataset, therefore it is common to use random or grid search strategies for different hyperparameter values.<\/p>\n<p>The more hyperparameters of an algorithm that you need to tune, the slower the tuning process. Therefore, it is desirable to select a minimum subset of model hyperparameters to search or tune.<\/p>\n<p>Not all model hyperparameters are equally important. Some hyperparameters have an outsized effect on the behavior, and in turn, the performance of a machine learning algorithm.<\/p>\n<p>As a machine learning practitioner, you must know which hyperparameters to focus on to get a good result quickly.<\/p>\n<p>In this tutorial, you will discover those hyperparameters that are most important for some of the top machine learning algorithms.<\/p>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_9208\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9208\" class=\"size-full wp-image-9208\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/10\/Hyperparameters-for-Classification-Machine-Learning-Algorithms.jpg\" alt=\"Hyperparameters for Classification Machine Learning Algorithms\" width=\"800\" height=\"500\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/10\/Hyperparameters-for-Classification-Machine-Learning-Algorithms.jpg 800w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/10\/Hyperparameters-for-Classification-Machine-Learning-Algorithms-300x188.jpg 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/10\/Hyperparameters-for-Classification-Machine-Learning-Algorithms-768x480.jpg 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"><\/p>\n<p id=\"caption-attachment-9208\" class=\"wp-caption-text\">Hyperparameters for Classification Machine Learning Algorithms<br \/>Photo by <a href=\"https:\/\/flickr.com\/photos\/shuttermonkey\/4934194353\/\">shuttermonkey<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Classification Algorithms Overview<\/h2>\n<p>We will take a closer look at the important hyperparameters of the top machine learning algorithms that you may use for classification.<\/p>\n<p>We will look at the hyperparameters you need to focus on and suggested values to try when tuning the model on your dataset.<\/p>\n<p>The suggestions are based both on advice from textbooks on the algorithms and practical advice suggested by practitioners, as well as a little of my own experience.<\/p>\n<p>The seven classification algorithms we will look at are as follows:<\/p>\n<ol>\n<li>Logistic Regression<\/li>\n<li>Ridge Classifier<\/li>\n<li>K-Nearest Neighbors (KNN)<\/li>\n<li>Support Vector Machine (SVM)<\/li>\n<li>Bagged Decision Trees (Bagging)<\/li>\n<li>Random Forest<\/li>\n<li>Stochastic Gradient Boosting<\/li>\n<\/ol>\n<p>We will consider these algorithms in the context of their scikit-learn implementation (Python); nevertheless, you can use the same hyperparameter suggestions with other platforms, such as Weka and R.<\/p>\n<p>A small grid searching example is also given for each algorithm that you can use as a starting point for your own classification predictive modeling project.<\/p>\n<p><strong>Note<\/strong>: if you have had success with different hyperparameter values or even different hyperparameters than those suggested in this tutorial, let me know in the comments below. I\u2019d love to hear about it.<\/p>\n<p>Let\u2019s dive in.<\/p>\n<h2>Logistic Regression<\/h2>\n<p>Logistic regression does not really have any critical hyperparameters to tune.<\/p>\n<p>Sometimes, you can see useful differences in performance or convergence with different solvers (<em>solver<\/em>).<\/p>\n<ul>\n<li><strong>solver<\/strong> in [\u2018newton-cg\u2019, \u2018lbfgs\u2019, \u2018liblinear\u2019, \u2018sag\u2019, \u2018saga\u2019]<\/li>\n<\/ul>\n<p>Regularization (<em>penalty<\/em>) can sometimes be helpful.<\/p>\n<ul>\n<li><strong>penalty<\/strong> in [\u2018none\u2019, \u2018l1\u2019, \u2018l2\u2019, \u2018elasticnet\u2019]<\/li>\n<\/ul>\n<p><strong>Note<\/strong>: not all solvers support all regularization terms.<\/p>\n<p>The C parameter controls the penality strength, which can also be effective.<\/p>\n<ul>\n<li><strong>C<\/strong> in [100, 10, 1.0, 0.1, 0.01]<\/li>\n<\/ul>\n<p>For the full list of hyperparameters, see:<\/p>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.linear_model.LogisticRegression.html\">sklearn.linear_model.LogisticRegression API<\/a>.<\/li>\n<\/ul>\n<p>The example below demonstrates grid searching the key hyperparameters for LogisticRegression on a synthetic binary classification dataset.<\/p>\n<p>Some combinations were omitted to cut back on the warnings\/errors.<\/p>\n<pre class=\"crayon-plain-tag\"># example of grid searching key hyperparametres for logistic regression\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.model_selection import GridSearchCV\r\nfrom sklearn.linear_model import LogisticRegression\r\n# define dataset\r\nX, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)\r\n# define models and parameters\r\nmodel = LogisticRegression()\r\nsolvers = ['newton-cg', 'lbfgs', 'liblinear']\r\npenalty = ['l2']\r\nc_values = [100, 10, 1.0, 0.1, 0.01]\r\n# define grid search\r\ngrid = dict(solver=solvers,penalty=penalty,C=c_values)\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\ngrid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)\r\ngrid_result = grid_search.fit(X, y)\r\n# summarize results\r\nprint(\"Best: %f using %s\" % (grid_result.best_score_, grid_result.best_params_))\r\nmeans = grid_result.cv_results_['mean_test_score']\r\nstds = grid_result.cv_results_['std_test_score']\r\nparams = grid_result.cv_results_['params']\r\nfor mean, stdev, param in zip(means, stds, params):\r\n    print(\"%f (%f) with: %r\" % (mean, stdev, param))<\/pre>\n<p>Running the example prints the best result as well as the results from all combinations evaluated.<\/p>\n<pre class=\"crayon-plain-tag\">Best: 0.945333 using {'C': 0.01, 'penalty': 'l2', 'solver': 'liblinear'}\r\n0.936333 (0.016829) with: {'C': 100, 'penalty': 'l2', 'solver': 'newton-cg'}\r\n0.937667 (0.017259) with: {'C': 100, 'penalty': 'l2', 'solver': 'lbfgs'}\r\n0.938667 (0.015861) with: {'C': 100, 'penalty': 'l2', 'solver': 'liblinear'}\r\n0.936333 (0.017413) with: {'C': 10, 'penalty': 'l2', 'solver': 'newton-cg'}\r\n0.938333 (0.017904) with: {'C': 10, 'penalty': 'l2', 'solver': 'lbfgs'}\r\n0.939000 (0.016401) with: {'C': 10, 'penalty': 'l2', 'solver': 'liblinear'}\r\n0.937333 (0.017114) with: {'C': 1.0, 'penalty': 'l2', 'solver': 'newton-cg'}\r\n0.939000 (0.017195) with: {'C': 1.0, 'penalty': 'l2', 'solver': 'lbfgs'}\r\n0.939000 (0.015780) with: {'C': 1.0, 'penalty': 'l2', 'solver': 'liblinear'}\r\n0.940000 (0.015706) with: {'C': 0.1, 'penalty': 'l2', 'solver': 'newton-cg'}\r\n0.940333 (0.014941) with: {'C': 0.1, 'penalty': 'l2', 'solver': 'lbfgs'}\r\n0.941000 (0.017000) with: {'C': 0.1, 'penalty': 'l2', 'solver': 'liblinear'}\r\n0.943000 (0.016763) with: {'C': 0.01, 'penalty': 'l2', 'solver': 'newton-cg'}\r\n0.943000 (0.016763) with: {'C': 0.01, 'penalty': 'l2', 'solver': 'lbfgs'}\r\n0.945333 (0.017651) with: {'C': 0.01, 'penalty': 'l2', 'solver': 'liblinear'}<\/pre>\n<\/p>\n<h2>Ridge Classifier<\/h2>\n<p>Ridge regression is a penalized linear regression model for predicting a numerical value.<\/p>\n<p>Nevertheless, it can be very effective when applied to classification.<\/p>\n<p>Perhaps the most important parameter to tune is the regularization strength (<em>alpha<\/em>). A good starting point might be values in the range [0.1 to 1.0]<\/p>\n<ul>\n<li><strong>alpha<\/strong> in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]<\/li>\n<\/ul>\n<p>For the full list of hyperparameters, see:<\/p>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.linear_model.RidgeClassifier.html\">sklearn.linear_model.RidgeClassifier API<\/a>.<\/li>\n<\/ul>\n<p>The example below demonstrates grid searching the key hyperparameters for RidgeClassifier on a synthetic binary classification dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># example of grid searching key hyperparametres for ridge classifier\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.model_selection import GridSearchCV\r\nfrom sklearn.linear_model import RidgeClassifier\r\n# define dataset\r\nX, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)\r\n# define models and parameters\r\nmodel = RidgeClassifier()\r\nalpha = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\r\n# define grid search\r\ngrid = dict(alpha=alpha)\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\ngrid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)\r\ngrid_result = grid_search.fit(X, y)\r\n# summarize results\r\nprint(\"Best: %f using %s\" % (grid_result.best_score_, grid_result.best_params_))\r\nmeans = grid_result.cv_results_['mean_test_score']\r\nstds = grid_result.cv_results_['std_test_score']\r\nparams = grid_result.cv_results_['params']\r\nfor mean, stdev, param in zip(means, stds, params):\r\n    print(\"%f (%f) with: %r\" % (mean, stdev, param))<\/pre>\n<p>Running the example prints the best result as well as the results from all combinations evaluated.<\/p>\n<pre class=\"crayon-plain-tag\">Best: 0.974667 using {'alpha': 0.1}\r\n0.974667 (0.014545) with: {'alpha': 0.1}\r\n0.974667 (0.014545) with: {'alpha': 0.2}\r\n0.974667 (0.014545) with: {'alpha': 0.3}\r\n0.974667 (0.014545) with: {'alpha': 0.4}\r\n0.974667 (0.014545) with: {'alpha': 0.5}\r\n0.974667 (0.014545) with: {'alpha': 0.6}\r\n0.974667 (0.014545) with: {'alpha': 0.7}\r\n0.974667 (0.014545) with: {'alpha': 0.8}\r\n0.974667 (0.014545) with: {'alpha': 0.9}\r\n0.974667 (0.014545) with: {'alpha': 1.0}<\/pre>\n<\/p>\n<h2>K-Nearest Neighbors (KNN)<\/h2>\n<p>The most important hyperparameter for KNN is the number of neighbors (<em>n_neighbors<\/em>).<\/p>\n<p>Test values between at least 1 and 21, perhaps just the odd numbers.<\/p>\n<ul>\n<li><strong>n_neighbors<\/strong> in [1 to 21]<\/li>\n<\/ul>\n<p>It may also be interesting to test different distance metrics (<em>metric<\/em>) for choosing the composition of the neighborhood.<\/p>\n<ul>\n<li><strong>metric<\/strong> in [\u2018euclidean\u2019, \u2018manhattan\u2019, \u2018minkowski\u2019]<\/li>\n<\/ul>\n<p>For a fuller list see:<\/p>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.neighbors.DistanceMetric.html\">sklearn.neighbors.DistanceMetric API<\/a><\/li>\n<\/ul>\n<p>It may also be interesting to test the contribution of members of the neighborhood via different weightings (<em>weights<\/em>).<\/p>\n<ul>\n<li><strong>weights<\/strong> in [\u2018uniform\u2019, \u2018distance\u2019]<\/li>\n<\/ul>\n<p>For the full list of hyperparameters, see:<\/p>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.neighbors.KNeighborsClassifier.html\">sklearn.neighbors.KNeighborsClassifier API<\/a>.<\/li>\n<\/ul>\n<p>The example below demonstrates grid searching the key hyperparameters for KNeighborsClassifier on a synthetic binary classification dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># example of grid searching key hyperparametres for KNeighborsClassifier\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.model_selection import GridSearchCV\r\nfrom sklearn.neighbors import KNeighborsClassifier\r\n# define dataset\r\nX, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)\r\n# define models and parameters\r\nmodel = KNeighborsClassifier()\r\nn_neighbors = range(1, 21, 2)\r\nweights = ['uniform', 'distance']\r\nmetric = ['euclidean', 'manhattan', 'minkowski']\r\n# define grid search\r\ngrid = dict(n_neighbors=n_neighbors,weights=weights,metric=metric)\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\ngrid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)\r\ngrid_result = grid_search.fit(X, y)\r\n# summarize results\r\nprint(\"Best: %f using %s\" % (grid_result.best_score_, grid_result.best_params_))\r\nmeans = grid_result.cv_results_['mean_test_score']\r\nstds = grid_result.cv_results_['std_test_score']\r\nparams = grid_result.cv_results_['params']\r\nfor mean, stdev, param in zip(means, stds, params):\r\n    print(\"%f (%f) with: %r\" % (mean, stdev, param))<\/pre>\n<p>Running the example prints the best result as well as the results from all combinations evaluated.<\/p>\n<pre class=\"crayon-plain-tag\">Best: 0.937667 using {'metric': 'manhattan', 'n_neighbors': 13, 'weights': 'uniform'}\r\n0.833667 (0.031674) with: {'metric': 'euclidean', 'n_neighbors': 1, 'weights': 'uniform'}\r\n0.833667 (0.031674) with: {'metric': 'euclidean', 'n_neighbors': 1, 'weights': 'distance'}\r\n0.895333 (0.030081) with: {'metric': 'euclidean', 'n_neighbors': 3, 'weights': 'uniform'}\r\n0.895333 (0.030081) with: {'metric': 'euclidean', 'n_neighbors': 3, 'weights': 'distance'}\r\n0.909000 (0.021810) with: {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}\r\n0.909000 (0.021810) with: {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}\r\n0.925333 (0.020774) with: {'metric': 'euclidean', 'n_neighbors': 7, 'weights': 'uniform'}\r\n0.925333 (0.020774) with: {'metric': 'euclidean', 'n_neighbors': 7, 'weights': 'distance'}\r\n0.929000 (0.027368) with: {'metric': 'euclidean', 'n_neighbors': 9, 'weights': 'uniform'}\r\n0.929000 (0.027368) with: {'metric': 'euclidean', 'n_neighbors': 9, 'weights': 'distance'}\r\n...<\/pre>\n<\/p>\n<h2>Support Vector Machine (SVM)<\/h2>\n<p>The SVM algorithm, like gradient boosting, is very popular, very effective, and provides a large number of hyperparameters to tune.<\/p>\n<p>Perhaps the first important parameter is the choice of kernel that will control the manner in which the input variables will be projected. There are many to choose from, but linear, polynomial, and RBF are the most common, perhaps just linear and RBF in practice.<\/p>\n<ul>\n<li><strong>kernels<\/strong> in [\u2018linear\u2019, \u2018poly\u2019, \u2018rbf\u2019, \u2018sigmoid\u2019]<\/li>\n<\/ul>\n<p>If the polynomial kernel works out, then it is a good idea to dive into the degree hyperparameter.<\/p>\n<p>Another critical parameter is the penalty (<em>C<\/em>) that can take on a range of values and has a dramatic effect on the shape of the resulting regions for each class. A log scale might be a good starting point.<\/p>\n<ul>\n<li><strong>C<\/strong> in [100, 10, 1.0, 0.1, 0.001]<\/li>\n<\/ul>\n<p>For the full list of hyperparameters, see:<\/p>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.svm.SVC.html\">sklearn.svm.SVC API<\/a>.<\/li>\n<\/ul>\n<p>The example below demonstrates grid searching the key hyperparameters for SVC on a synthetic binary classification dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># example of grid searching key hyperparametres for SVC\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.model_selection import GridSearchCV\r\nfrom sklearn.svm import SVC\r\n# define dataset\r\nX, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)\r\n# define model and parameters\r\nmodel = SVC()\r\nkernel = ['poly', 'rbf', 'sigmoid']\r\nC = [50, 10, 1.0, 0.1, 0.01]\r\ngamma = ['scale']\r\n# define grid search\r\ngrid = dict(kernel=kernel,C=C,gamma=gamma)\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\ngrid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)\r\ngrid_result = grid_search.fit(X, y)\r\n# summarize results\r\nprint(\"Best: %f using %s\" % (grid_result.best_score_, grid_result.best_params_))\r\nmeans = grid_result.cv_results_['mean_test_score']\r\nstds = grid_result.cv_results_['std_test_score']\r\nparams = grid_result.cv_results_['params']\r\nfor mean, stdev, param in zip(means, stds, params):\r\n    print(\"%f (%f) with: %r\" % (mean, stdev, param))<\/pre>\n<p>Running the example prints the best result as well as the results from all combinations evaluated.<\/p>\n<pre class=\"crayon-plain-tag\">Best: 0.974333 using {'C': 1.0, 'gamma': 'scale', 'kernel': 'poly'}\r\n0.973667 (0.012512) with: {'C': 50, 'gamma': 'scale', 'kernel': 'poly'}\r\n0.970667 (0.018062) with: {'C': 50, 'gamma': 'scale', 'kernel': 'rbf'}\r\n0.945333 (0.024594) with: {'C': 50, 'gamma': 'scale', 'kernel': 'sigmoid'}\r\n0.973667 (0.012512) with: {'C': 10, 'gamma': 'scale', 'kernel': 'poly'}\r\n0.970667 (0.018062) with: {'C': 10, 'gamma': 'scale', 'kernel': 'rbf'}\r\n0.957000 (0.016763) with: {'C': 10, 'gamma': 'scale', 'kernel': 'sigmoid'}\r\n0.974333 (0.012565) with: {'C': 1.0, 'gamma': 'scale', 'kernel': 'poly'}\r\n0.971667 (0.016948) with: {'C': 1.0, 'gamma': 'scale', 'kernel': 'rbf'}\r\n0.966333 (0.016224) with: {'C': 1.0, 'gamma': 'scale', 'kernel': 'sigmoid'}\r\n0.972333 (0.013585) with: {'C': 0.1, 'gamma': 'scale', 'kernel': 'poly'}\r\n0.974000 (0.013317) with: {'C': 0.1, 'gamma': 'scale', 'kernel': 'rbf'}\r\n0.971667 (0.015934) with: {'C': 0.1, 'gamma': 'scale', 'kernel': 'sigmoid'}\r\n0.972333 (0.013585) with: {'C': 0.01, 'gamma': 'scale', 'kernel': 'poly'}\r\n0.973667 (0.014716) with: {'C': 0.01, 'gamma': 'scale', 'kernel': 'rbf'}\r\n0.974333 (0.013828) with: {'C': 0.01, 'gamma': 'scale', 'kernel': 'sigmoid'}<\/pre>\n<\/p>\n<h2>Bagged Decision Trees (Bagging)<\/h2>\n<p>The most important parameter for bagged decision trees is the number of trees (<em>n_estimators<\/em>).<\/p>\n<p>Ideally, this should be increased until no further improvement is seen in the model.<\/p>\n<p>Good values might be a log scale from 10 to 1,000.<\/p>\n<ul>\n<li><strong>n_estimators<\/strong> in [10, 100, 1000]<\/li>\n<\/ul>\n<p>For the full list of hyperparameters, see:<\/p>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.BaggingClassifier.html\">sklearn.ensemble.BaggingClassifier API<\/a><\/li>\n<\/ul>\n<p>The example below demonstrates grid searching the key hyperparameters for BaggingClassifier on a synthetic binary classification dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># example of grid searching key hyperparameters for BaggingClassifier\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.model_selection import GridSearchCV\r\nfrom sklearn.ensemble import BaggingClassifier\r\n# define dataset\r\nX, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)\r\n# define models and parameters\r\nmodel = BaggingClassifier()\r\nn_estimators = [10, 100, 1000]\r\n# define grid search\r\ngrid = dict(n_estimators=n_estimators)\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\ngrid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)\r\ngrid_result = grid_search.fit(X, y)\r\n# summarize results\r\nprint(\"Best: %f using %s\" % (grid_result.best_score_, grid_result.best_params_))\r\nmeans = grid_result.cv_results_['mean_test_score']\r\nstds = grid_result.cv_results_['std_test_score']\r\nparams = grid_result.cv_results_['params']\r\nfor mean, stdev, param in zip(means, stds, params):\r\n    print(\"%f (%f) with: %r\" % (mean, stdev, param))<\/pre>\n<p>Running the example prints the best result as well as the results from all combinations evaluated.<\/p>\n<pre class=\"crayon-plain-tag\">Best: 0.873667 using {'n_estimators': 1000}\r\n0.839000 (0.038588) with: {'n_estimators': 10}\r\n0.869333 (0.030434) with: {'n_estimators': 100}\r\n0.873667 (0.035070) with: {'n_estimators': 1000}<\/pre>\n<\/p>\n<h2>Random Forest<\/h2>\n<p>The most important parameter is the number of random features to sample at each split point (<em>max_features<\/em>).<\/p>\n<p>You could try a range of integer values, such as 1 to 20, or 1 to half the number of input features.<\/p>\n<ul>\n<li><strong>max_features<\/strong> [1 to 20]<\/li>\n<\/ul>\n<p>Alternately, you could try a suite of different default value calculators.<\/p>\n<ul>\n<li><strong>max_features<\/strong> in [\u2018sqrt\u2019, \u2018log2\u2019]<\/li>\n<\/ul>\n<p>Another important parameter for random forest is the number of trees (<em>n_estimators<\/em>).<\/p>\n<p>Ideally, this should be increased until no further improvement is seen in the model.<\/p>\n<p>Good values might be a log scale from 10 to 1,000.<\/p>\n<ul>\n<li><strong>n_estimators<\/strong> in [10, 100, 1000]<\/li>\n<\/ul>\n<p>For the full list of hyperparameters, see:<\/p>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.RandomForestClassifier.html\">sklearn.ensemble.RandomForestClassifier API<\/a>.<\/li>\n<\/ul>\n<p>The example below demonstrates grid searching the key hyperparameters for BaggingClassifier on a synthetic binary classification dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># example of grid searching key hyperparameters for RandomForestClassifier\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.model_selection import GridSearchCV\r\nfrom sklearn.ensemble import RandomForestClassifier\r\n# define dataset\r\nX, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)\r\n# define models and parameters\r\nmodel = RandomForestClassifier()\r\nn_estimators = [10, 100, 1000]\r\nmax_features = ['sqrt', 'log2']\r\n# define grid search\r\ngrid = dict(n_estimators=n_estimators,max_features=max_features)\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\ngrid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)\r\ngrid_result = grid_search.fit(X, y)\r\n# summarize results\r\nprint(\"Best: %f using %s\" % (grid_result.best_score_, grid_result.best_params_))\r\nmeans = grid_result.cv_results_['mean_test_score']\r\nstds = grid_result.cv_results_['std_test_score']\r\nparams = grid_result.cv_results_['params']\r\nfor mean, stdev, param in zip(means, stds, params):\r\n    print(\"%f (%f) with: %r\" % (mean, stdev, param))<\/pre>\n<p>Running the example prints the best result as well as the results from all combinations evaluated.<\/p>\n<pre class=\"crayon-plain-tag\">Best: 0.952000 using {'max_features': 'log2', 'n_estimators': 1000}\r\n0.841000 (0.032078) with: {'max_features': 'sqrt', 'n_estimators': 10}\r\n0.938333 (0.020830) with: {'max_features': 'sqrt', 'n_estimators': 100}\r\n0.944667 (0.024998) with: {'max_features': 'sqrt', 'n_estimators': 1000}\r\n0.817667 (0.033235) with: {'max_features': 'log2', 'n_estimators': 10}\r\n0.940667 (0.021592) with: {'max_features': 'log2', 'n_estimators': 100}\r\n0.952000 (0.019562) with: {'max_features': 'log2', 'n_estimators': 1000}<\/pre>\n<\/p>\n<h2>Stochastic Gradient Boosting<\/h2>\n<p>Also called Gradient Boosting Machine (GBM) or named for the specific implementation, such as XGBoost.<\/p>\n<p>The gradient boosting algorithm has many parameters to tune.<\/p>\n<p>There are some parameter pairings that are important to consider. The first is the learning rate, also called shrinkage or eta (<em>learning_rate<\/em>) and the number of trees in the model (<em>n_estimators<\/em>). Both could be considered on a log scale, although in different directions.<\/p>\n<ul>\n<li><strong>learning_rate<\/strong> in [0.001, 0.01, 0.1]<\/li>\n<li><strong>n_estimators<\/strong> [10, 100, 1000]<\/li>\n<\/ul>\n<p>Another pairing is the number of rows or subset of the data to consider for each tree (<em>subsample<\/em>) and the depth of each tree (<em>max_depth<\/em>). These could be grid searched at a 0.1 and 1 interval respectively, although common values can be tested directly.<\/p>\n<ul>\n<li><strong>subsample<\/strong> in [0.5, 0.7, 1.0]<\/li>\n<li><strong>max_depth<\/strong> in [3, 7, 9]<\/li>\n<\/ul>\n<p>For more detailed advice on tuning the XGBoost implementation, see:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/configure-gradient-boosting-algorithm\/\">How to Configure the Gradient Boosting Algorithm<\/a><\/li>\n<\/ul>\n<p>For the full list of hyperparameters, see:<\/p>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.GradientBoostingClassifier.html\">sklearn.ensemble.GradientBoostingClassifier API<\/a>.<\/li>\n<\/ul>\n<p>The example below demonstrates grid searching the key hyperparameters for GradientBoostingClassifier on a synthetic binary classification dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># example of grid searching key hyperparameters for GradientBoostingClassifier\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.model_selection import GridSearchCV\r\nfrom sklearn.ensemble import GradientBoostingClassifier\r\n# define dataset\r\nX, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)\r\n# define models and parameters\r\nmodel = GradientBoostingClassifier()\r\nn_estimators = [10, 100, 1000]\r\nlearning_rate = [0.001, 0.01, 0.1]\r\nsubsample = [0.5, 0.7, 1.0]\r\nmax_depth = [3, 7, 9]\r\n# define grid search\r\ngrid = dict(learning_rate=learning_rate, n_estimators=n_estimators, subsample=subsample, max_depth=max_depth)\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\ngrid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)\r\ngrid_result = grid_search.fit(X, y)\r\n# summarize results\r\nprint(\"Best: %f using %s\" % (grid_result.best_score_, grid_result.best_params_))\r\nmeans = grid_result.cv_results_['mean_test_score']\r\nstds = grid_result.cv_results_['std_test_score']\r\nparams = grid_result.cv_results_['params']\r\nfor mean, stdev, param in zip(means, stds, params):\r\n    print(\"%f (%f) with: %r\" % (mean, stdev, param))<\/pre>\n<p>Running the example prints the best result as well as the results from all combinations evaluated.<\/p>\n<pre class=\"crayon-plain-tag\">Best: 0.936667 using {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 1000, 'subsample': 0.5}\r\n0.803333 (0.042058) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 10, 'subsample': 0.5}\r\n0.783667 (0.042386) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 10, 'subsample': 0.7}\r\n0.711667 (0.041157) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 10, 'subsample': 1.0}\r\n0.832667 (0.040244) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 100, 'subsample': 0.5}\r\n0.809667 (0.040040) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 100, 'subsample': 0.7}\r\n0.741333 (0.043261) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 100, 'subsample': 1.0}\r\n0.881333 (0.034130) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 1000, 'subsample': 0.5}\r\n0.866667 (0.035150) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 1000, 'subsample': 0.7}\r\n0.838333 (0.037424) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 1000, 'subsample': 1.0}\r\n0.838333 (0.036614) with: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 10, 'subsample': 0.5}\r\n0.821667 (0.040586) with: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 10, 'subsample': 0.7}\r\n0.729000 (0.035903) with: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 10, 'subsample': 1.0}\r\n0.884667 (0.036854) with: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 100, 'subsample': 0.5}\r\n0.871333 (0.035094) with: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 100, 'subsample': 0.7}\r\n0.729000 (0.037625) with: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 100, 'subsample': 1.0}\r\n0.905667 (0.033134) with: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 1000, 'subsample': 0.5}\r\n...<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/classes.html\">scikit-learn API<\/a><\/li>\n<li><a href=\"https:\/\/topepo.github.io\/caret\/available-models.html\">Caret List of Algorithms and Tuning Parameters<\/a><\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered the top hyperparameters and how to configure them for top machine learning algorithms.<\/p>\n<p>Do you have other hyperparameter suggestions? Let me know in the comments below.<\/p>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/hyperparameters-for-classification-machine-learning-algorithms\/\">Tune Hyperparameters for Classification Machine Learning Algorithms<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/hyperparameters-for-classification-machine-learning-algorithms\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Machine learning algorithms have hyperparameters that allow you to tailor the behavior of the algorithm to your specific dataset. Hyperparameters are different [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/12\/12\/tune-hyperparameters-for-classification-machine-learning-algorithms\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2918,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2917"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2917"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2917\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2918"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2917"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2917"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2917"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}