{"id":4511,"date":"2021-03-25T06:28:59","date_gmt":"2021-03-25T06:28:59","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2021\/03\/25\/how-to-manually-optimize-machine-learning-model-hyperparameters\/"},"modified":"2021-03-25T06:28:59","modified_gmt":"2021-03-25T06:28:59","slug":"how-to-manually-optimize-machine-learning-model-hyperparameters","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2021\/03\/25\/how-to-manually-optimize-machine-learning-model-hyperparameters\/","title":{"rendered":"How to Manually Optimize Machine Learning Model Hyperparameters"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Machine learning algorithms have <a href=\"https:\/\/machinelearningmastery.com\/difference-between-a-parameter-and-a-hyperparameter\/\">hyperparameters<\/a> that allow the algorithms to be tailored to specific datasets.<\/p>\n<p>Although the impact of hyperparameters may be understood generally, their specific effect on a dataset and their interactions during learning may not be known. Therefore, it is important to tune the values of algorithm hyperparameters as part of a machine learning project.<\/p>\n<p>It is common to use naive optimization algorithms to tune hyperparameters, such as a grid search and a random search. An alternate approach is to use a stochastic optimization algorithm, like a stochastic hill climbing algorithm.<\/p>\n<p>In this tutorial, you will discover how to manually optimize the hyperparameters of machine learning algorithms.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Stochastic optimization algorithms can be used instead of grid and random search for hyperparameter optimization.<\/li>\n<li>How to use a stochastic hill climbing algorithm to tune the hyperparameters of the Perceptron algorithm.<\/li>\n<li>How to manually optimize the hyperparameters of the XGBoost gradient boosting algorithm.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_11953\" style=\"width: 809px\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-11953\" loading=\"lazy\" class=\"size-full wp-image-11953\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/03\/How-to-Manually-Optimize-Machine-Learning-Model-Hyperparameters.jpg\" alt=\"How to Manually Optimize Machine Learning Model Hyperparameters\" width=\"799\" height=\"533\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/03\/How-to-Manually-Optimize-Machine-Learning-Model-Hyperparameters.jpg 799w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/03\/How-to-Manually-Optimize-Machine-Learning-Model-Hyperparameters-300x200.jpg 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/03\/How-to-Manually-Optimize-Machine-Learning-Model-Hyperparameters-768x512.jpg 768w\" sizes=\"(max-width: 799px) 100vw, 799px\"><\/p>\n<p id=\"caption-attachment-11953\" class=\"wp-caption-text\">How to Manually Optimize Machine Learning Model Hyperparameters<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/jfmacdonald\/19867924249\/\">john farrell macdonald<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Manual Hyperparameter Optimization<\/li>\n<li>Perceptron Hyperparameter Optimization<\/li>\n<li>XGBoost Hyperparameter Optimization<\/li>\n<\/ol>\n<h2>Manual Hyperparameter Optimization<\/h2>\n<p>Machine learning models have hyperparameters that you must set in order to customize the model to your dataset.<\/p>\n<p>Often, the general effects of hyperparameters on a model are known, but how to best set a hyperparameter and combinations of interacting hyperparameters for a given dataset is challenging.<\/p>\n<p>A better approach is to objectively search different values for model hyperparameters and choose a subset that results in a model that achieves the best performance on a given dataset. This is called hyperparameter optimization, or hyperparameter tuning.<\/p>\n<p>A range of different optimization algorithms may be used, although two of the simplest and most common methods are random search and grid search.<\/p>\n<ul>\n<li>\n<strong>Random Search<\/strong>. Define a search space as a bounded domain of hyperparameter values and randomly sample points in that domain.<\/li>\n<li>\n<strong>Grid Search<\/strong>. Define a search space as a grid of hyperparameter values and evaluate every position in the grid.<\/li>\n<\/ul>\n<p>Grid search is great for spot-checking combinations that are known to perform well generally. Random search is great for discovery and getting hyperparameter combinations that you would not have guessed intuitively, although it often requires more time to execute.<\/p>\n<p>For more on grid and random search for hyperparameter tuning, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/hyperparameter-optimization-with-random-search-and-grid-search\/\">Hyperparameter Optimization With Random Search and Grid Search<\/a><\/li>\n<\/ul>\n<p>Grid and random search are primitive optimization algorithms, and it is possible to use any optimization we like to tune the performance of a machine learning algorithm. For example, it is possible to use stochastic optimization algorithms. This might be desirable when good or great performance is required and there are sufficient resources available to tune the model.<\/p>\n<p>Next, let\u2019s look at how we might use a stochastic hill climbing algorithm to tune the performance of the Perceptron algorithm.<\/p>\n<h2>Perceptron Hyperparameter Optimization<\/h2>\n<p>The <a href=\"https:\/\/machinelearningmastery.com\/implement-perceptron-algorithm-scratch-python\/\">Perceptron algorithm<\/a> is the simplest type of artificial neural network.<\/p>\n<p>It is a model of a single neuron that can be used for two-class classification problems and provides the foundation for later developing much larger networks.<\/p>\n<p>In this section, we will explore how to manually optimize the hyperparameters of the Perceptron model.<\/p>\n<p>First, let\u2019s define a synthetic binary classification problem that we can use as the focus of optimizing the model.<\/p>\n<p>We can use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">make_classification() function<\/a> to define a binary classification problem with 1,000 rows and five input variables.<\/p>\n<p>The example below creates the dataset and summarizes the shape of the data.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># define a binary classification dataset\r\nfrom sklearn.datasets import make_classification\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)\r\n# summarize the shape of the dataset\r\nprint(X.shape, y.shape)<\/pre>\n<p>Running the example prints the shape of the created dataset, confirming our expectations.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">(1000, 5) (1000,)<\/pre>\n<p>The scikit-learn provides an implementation of the Perceptron model via the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.linear_model.Perceptron.html\">Perceptron class<\/a>.<\/p>\n<p>Before we tune the hyperparameters of the model, we can establish a baseline in performance using the default hyperparameters.<\/p>\n<p>We will evaluate the model using good practices of <a href=\"https:\/\/machinelearningmastery.com\/k-fold-cross-validation\/\">repeated stratified k-fold cross-validation<\/a> via the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.RepeatedStratifiedKFold.html\">RepeatedStratifiedKFold class<\/a>.<\/p>\n<p>The complete example of evaluating the Perceptron model with default hyperparameters on our synthetic binary classification dataset is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># perceptron default hyperparameters for binary classification\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.linear_model import Perceptron\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)\r\n# define model\r\nmodel = Perceptron()\r\n# define evaluation procedure\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n# evaluate model\r\nscores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n# report result\r\nprint('Mean Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))<\/pre>\n<p>Running the example reports evaluates the model and reports the mean and standard deviation of the classification accuracy.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the model with default hyperparameters achieved a classification accuracy of about 78.5 percent.<\/p>\n<p>We would hope that we can achieve better performance than this with optimized hyperparameters.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Mean Accuracy: 0.786 (0.069)<\/pre>\n<p>Next, we can optimize the hyperparameters of the Perceptron model using a stochastic hill climbing algorithm.<\/p>\n<p>There are many hyperparameters that we could optimize, although we will focus on two that perhaps have the most impact on the learning behavior of the model; they are:<\/p>\n<ul>\n<li>Learning Rate (<em>eta0<\/em>).<\/li>\n<li>Regularization (<em>alpha<\/em>).<\/li>\n<\/ul>\n<p>The <a href=\"https:\/\/machinelearningmastery.com\/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks\/\">learning rate<\/a> controls the amount the model is updated based on prediction errors and controls the speed of learning. The default value of eta is 1.0. reasonable values are larger than zero (e.g. larger than 1e-8 or 1e-10) and probably less than 1.0<\/p>\n<p>By default, the Perceptron does not use any regularization, but we will enable \u201c<em>elastic net<\/em>\u201d regularization which applies both <a href=\"https:\/\/machinelearningmastery.com\/weight-regularization-to-reduce-overfitting-of-deep-learning-models\/\">L1 and L2 regularization<\/a> during learning. This will encourage the model to seek small model weights and, in turn, often better performance.<\/p>\n<p>We will tune the \u201c<em>alpha<\/em>\u201d hyperparameter that controls the weighting of the regularization, e.g. the amount it impacts the learning. If set to 0.0, it is as though no regularization is being used. Reasonable values are between 0.0 and 1.0.<\/p>\n<p>First, we need to define the objective function for the optimization algorithm. We will evaluate a configuration using mean classification accuracy with repeated stratified k-fold cross-validation. We will seek to maximize accuracy in the configurations.<\/p>\n<p>The <em>objective()<\/em> function below implements this, taking the dataset and a list of config values. The config values (learning rate and regularization weighting) are unpacked, used to configure the model, which is then evaluated, and the mean accuracy is returned.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># objective function\r\ndef objective(X, y, cfg):\r\n\t# unpack config\r\n\teta, alpha = cfg\r\n\t# define model\r\n\tmodel = Perceptron(penalty='elasticnet', alpha=alpha, eta0=eta)\r\n\t# define evaluation procedure\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\t# evaluate model\r\n\tscores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n\t# calculate mean accuracy\r\n\tresult = mean(scores)\r\n\treturn result<\/pre>\n<p>Next, we need a function to take a step in the search space.<\/p>\n<p>The search space is defined by two variables (<em>eta<\/em> and <em>alpha<\/em>). A step in the search space must have some relationship to the previous values and must be bound to sensible values (e.g. between 0 and 1).<\/p>\n<p>We will use a \u201c<em>step size<\/em>\u201d hyperparameter that controls how far the algorithm is allowed to move from the existing configuration. A new configuration will be chosen probabilistically using a Gaussian distribution with the current value as the mean of the distribution and the step size as the standard deviation of the distribution.<\/p>\n<p>We can use the <a href=\"https:\/\/numpy.org\/doc\/stable\/reference\/random\/generated\/numpy.random.randn.html\">randn() NumPy function<\/a> to generate random numbers with a Gaussian distribution.<\/p>\n<p>The <em>step()<\/em> function below implements this and will take a step in the search space and generate a new configuration using an existing configuration.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># take a step in the search space\r\ndef step(cfg, step_size):\r\n\t# unpack the configuration\r\n\teta, alpha = cfg\r\n\t# step eta\r\n\tnew_eta = eta + randn() * step_size\r\n\t# check the bounds of eta\r\n\tif new_eta &lt;= 0.0:\r\n\t\tnew_eta = 1e-8\r\n\t# step alpha\r\n\tnew_alpha = alpha + randn() * step_size\r\n\t# check the bounds of alpha\r\n\tif new_alpha &lt; 0.0:\r\n\t\tnew_alpha = 0.0\r\n\t# return the new configuration\r\n\treturn [new_eta, new_alpha]<\/pre>\n<p>Next, we need to implement the <a href=\"https:\/\/machinelearningmastery.com\/stochastic-hill-climbing-in-python-from-scratch\/\">stochastic hill climbing algorithm<\/a> that will call our <em>objective()<\/em> function to evaluate candidate solutions and our <em>step()<\/em> function to take a step in the search space.<\/p>\n<p>The search first generates a random initial solution, in this case with eta and alpha values in the range 0 and 1. The initial solution is then evaluated and is taken as the current best working solution.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# starting point for the search\r\nsolution = [rand(), rand()]\r\n# evaluate the initial point\r\nsolution_eval = objective(X, y, solution)<\/pre>\n<p>Next, the algorithm iterates for a fixed number of iterations provided as a hyperparameter to the search. Each iteration involves taking a step and evaluating the new candidate solution.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# take a step\r\ncandidate = step(solution, step_size)\r\n# evaluate candidate point\r\ncandidte_eval = objective(X, y, candidate)<\/pre>\n<p>If the new solution is better than the current working solution, it is taken as the new current working solution.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# check if we should keep the new point\r\nif candidte_eval &gt;= solution_eval:\r\n\t# store the new point\r\n\tsolution, solution_eval = candidate, candidte_eval\r\n\t# report progress\r\n\tprint('&gt;%d, cfg=%s %.5f' % (i, solution, solution_eval))<\/pre>\n<p>At the end of the search, the best solution and its performance are then returned.<\/p>\n<p>Tying this together, the <em>hillclimbing()<\/em> function below implements the stochastic hill climbing algorithm for tuning the Perceptron algorithm, taking the dataset, objective function, number of iterations, and step size as arguments.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># hill climbing local search algorithm\r\ndef hillclimbing(X, y, objective, n_iter, step_size):\r\n\t# starting point for the search\r\n\tsolution = [rand(), rand()]\r\n\t# evaluate the initial point\r\n\tsolution_eval = objective(X, y, solution)\r\n\t# run the hill climb\r\n\tfor i in range(n_iter):\r\n\t\t# take a step\r\n\t\tcandidate = step(solution, step_size)\r\n\t\t# evaluate candidate point\r\n\t\tcandidte_eval = objective(X, y, candidate)\r\n\t\t# check if we should keep the new point\r\n\t\tif candidte_eval &gt;= solution_eval:\r\n\t\t\t# store the new point\r\n\t\t\tsolution, solution_eval = candidate, candidte_eval\r\n\t\t\t# report progress\r\n\t\t\tprint('&gt;%d, cfg=%s %.5f' % (i, solution, solution_eval))\r\n\treturn [solution, solution_eval]<\/pre>\n<p>We can then call the algorithm and report the results of the search.<\/p>\n<p>In this case, we will run the algorithm for 100 iterations and use a step size of 0.1, chosen after a little trial and error.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define the total iterations\r\nn_iter = 100\r\n# step size in the search space\r\nstep_size = 0.1\r\n# perform the hill climbing search\r\ncfg, score = hillclimbing(X, y, objective, n_iter, step_size)\r\nprint('Done!')\r\nprint('cfg=%s: Mean Accuracy: %f' % (cfg, score))<\/pre>\n<p>Tying this together, the complete example of manually tuning the Perceptron algorithm is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># manually search perceptron hyperparameters for binary classification\r\nfrom numpy import mean\r\nfrom numpy.random import randn\r\nfrom numpy.random import rand\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.linear_model import Perceptron\r\n\r\n# objective function\r\ndef objective(X, y, cfg):\r\n\t# unpack config\r\n\teta, alpha = cfg\r\n\t# define model\r\n\tmodel = Perceptron(penalty='elasticnet', alpha=alpha, eta0=eta)\r\n\t# define evaluation procedure\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\t# evaluate model\r\n\tscores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n\t# calculate mean accuracy\r\n\tresult = mean(scores)\r\n\treturn result\r\n\r\n# take a step in the search space\r\ndef step(cfg, step_size):\r\n\t# unpack the configuration\r\n\teta, alpha = cfg\r\n\t# step eta\r\n\tnew_eta = eta + randn() * step_size\r\n\t# check the bounds of eta\r\n\tif new_eta &lt;= 0.0:\r\n\t\tnew_eta = 1e-8\r\n\t# step alpha\r\n\tnew_alpha = alpha + randn() * step_size\r\n\t# check the bounds of alpha\r\n\tif new_alpha &lt; 0.0:\r\n\t\tnew_alpha = 0.0\r\n\t# return the new configuration\r\n\treturn [new_eta, new_alpha]\r\n\r\n# hill climbing local search algorithm\r\ndef hillclimbing(X, y, objective, n_iter, step_size):\r\n\t# starting point for the search\r\n\tsolution = [rand(), rand()]\r\n\t# evaluate the initial point\r\n\tsolution_eval = objective(X, y, solution)\r\n\t# run the hill climb\r\n\tfor i in range(n_iter):\r\n\t\t# take a step\r\n\t\tcandidate = step(solution, step_size)\r\n\t\t# evaluate candidate point\r\n\t\tcandidte_eval = objective(X, y, candidate)\r\n\t\t# check if we should keep the new point\r\n\t\tif candidte_eval &gt;= solution_eval:\r\n\t\t\t# store the new point\r\n\t\t\tsolution, solution_eval = candidate, candidte_eval\r\n\t\t\t# report progress\r\n\t\t\tprint('&gt;%d, cfg=%s %.5f' % (i, solution, solution_eval))\r\n\treturn [solution, solution_eval]\r\n\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)\r\n# define the total iterations\r\nn_iter = 100\r\n# step size in the search space\r\nstep_size = 0.1\r\n# perform the hill climbing search\r\ncfg, score = hillclimbing(X, y, objective, n_iter, step_size)\r\nprint('Done!')\r\nprint('cfg=%s: Mean Accuracy: %f' % (cfg, score))<\/pre>\n<p>Running the example reports the configuration and result each time an improvement is seen during the search. At the end of the run, the best configuration and result are reported.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the best result involved using a learning rate slightly above 1 at 1.004 and a regularization weight of about 0.002 achieving a mean accuracy of about 79.1 percent, better than the default configuration that achieved an accuracy of about 78.5 percent.<\/p>\n<p><strong>Can you get a better result?<\/strong><br \/>\nLet me know in the comments below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">&gt;0, cfg=[0.5827274503894747, 0.260872709578015] 0.70533\r\n&gt;4, cfg=[0.5449820307807399, 0.3017271170801444] 0.70567\r\n&gt;6, cfg=[0.6286475606495414, 0.17499090243915086] 0.71933\r\n&gt;7, cfg=[0.5956196828965779, 0.0] 0.78633\r\n&gt;8, cfg=[0.5878361167354715, 0.0] 0.78633\r\n&gt;10, cfg=[0.6353507984485595, 0.0] 0.78633\r\n&gt;13, cfg=[0.5690530537610675, 0.0] 0.78633\r\n&gt;17, cfg=[0.6650936023999641, 0.0] 0.78633\r\n&gt;22, cfg=[0.9070451625704087, 0.0] 0.78633\r\n&gt;23, cfg=[0.9253366187387938, 0.0] 0.78633\r\n&gt;26, cfg=[0.9966143540220266, 0.0] 0.78633\r\n&gt;31, cfg=[1.0048613895650054, 0.002162219228449132] 0.79133\r\nDone!\r\ncfg=[1.0048613895650054, 0.002162219228449132]: Mean Accuracy: 0.791333<\/pre>\n<p>Now that we are familiar with how to use a stochastic hill climbing algorithm to tune the hyperparameters of a simple machine learning algorithm, let\u2019s look at tuning a more advanced algorithm, such as XGBoost.<\/p>\n<h2>XGBoost Hyperparameter Optimization<\/h2>\n<p>XGBoost is short for <a href=\"https:\/\/machinelearningmastery.com\/extreme-gradient-boosting-ensemble-in-python\/\">Extreme Gradient Boosting<\/a> and is an efficient implementation of the stochastic gradient boosting machine learning algorithm.<\/p>\n<p>The stochastic gradient boosting algorithm, also called gradient boosting machines or tree boosting, is a powerful machine learning technique that performs well or even best on a wide range of challenging machine learning problems.<\/p>\n<p>First, the XGBoost library must be installed.<\/p>\n<p>You can install it using pip, as follows:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">sudo pip install xgboost<\/pre>\n<p>Once installed, you can confirm that it was installed successfully and that you are using a modern version by running the following code:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># xgboost\r\nimport xgboost\r\nprint(\"xgboost\", xgboost.__version__)<\/pre>\n<p>Running the code, you should see the following version number or higher.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">xgboost 1.0.1<\/pre>\n<p>Although the XGBoost library has its own Python API, we can use XGBoost models with the scikit-learn API via the <a href=\"https:\/\/xgboost.readthedocs.io\/en\/latest\/python\/python_api.html\">XGBClassifier wrapper class<\/a>.<\/p>\n<p>An instance of the model can be instantiated and used just like any other scikit-learn class for model evaluation. For example:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define model\r\nmodel = XGBClassifier()<\/pre>\n<p>Before we tune the hyperparameters of XGBoost, we can establish a baseline in performance using the default hyperparameters.<\/p>\n<p>We will use the same synthetic binary classification dataset from the previous section and the same test harness of repeated stratified k-fold cross-validation.<\/p>\n<p>The complete example of evaluating the performance of XGBoost with default hyperparameters is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># xgboost with default hyperparameters for binary classification\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom xgboost import XGBClassifier\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)\r\n# define model\r\nmodel = XGBClassifier()\r\n# define evaluation procedure\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n# evaluate model\r\nscores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n# report result\r\nprint('Mean Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))<\/pre>\n<p>Running the example evaluates the model and reports the mean and standard deviation of the classification accuracy.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the model with default hyperparameters achieved a classification accuracy of about 84.9 percent.<\/p>\n<p>We would hope that we can achieve better performance than this with optimized hyperparameters.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Mean Accuracy: 0.849 (0.040)<\/pre>\n<p>Next, we can adapt the stochastic hill climbing optimization algorithm to tune the hyperparameters of the XGBoost model.<\/p>\n<p>There are many hyperparameters that we may want to optimize for the XGBoost model.<\/p>\n<p>For an overview of how to tune the XGBoost model, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/configure-gradient-boosting-algorithm\/\">How to Configure the Gradient Boosting Algorithm<\/a><\/li>\n<\/ul>\n<p>We will focus on four key hyperparameters; they are:<\/p>\n<ul>\n<li>Learning Rate (<em>learning_rate<\/em>)<\/li>\n<li>Number of Trees (<em>n_estimators<\/em>)<\/li>\n<li>Subsample Percentage (<em>subsample<\/em>)<\/li>\n<li>Tree Depth (<em>max_depth<\/em>)<\/li>\n<\/ul>\n<p>The <strong>learning rate<\/strong> controls the contribution of each tree to the ensemble. Sensible values are less than 1.0 and slightly above 0.0 (e.g. 1e-8).<\/p>\n<p>The <strong>number of trees<\/strong> controls the size of the ensemble, and often, more trees is better to a point of diminishing returns. Sensible values are between 1 tree and hundreds or thousands of trees.<\/p>\n<p>The <strong>subsample<\/strong> percentages define the random sample size used to train each tree, defined as a percentage of the size of the original dataset. Values are between a value slightly above 0.0 (e.g. 1e-8) and 1.0<\/p>\n<p>The <strong>tree depth<\/strong> is the number of levels in each tree. Deeper trees are more specific to the training dataset and perhaps overfit. Shorter trees often generalize better. Sensible values are between 1 and 10 or 20.<\/p>\n<p>First, we must update the <em>objective()<\/em> function to unpack the hyperparameters of the XGBoost model, configure it, and then evaluate the mean classification accuracy.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># objective function\r\ndef objective(X, y, cfg):\r\n\t# unpack config\r\n\tlrate, n_tree, subsam, depth = cfg\r\n\t# define model\r\n\tmodel = XGBClassifier(learning_rate=lrate, n_estimators=n_tree, subsample=subsam, max_depth=depth)\r\n\t# define evaluation procedure\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\t# evaluate model\r\n\tscores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n\t# calculate mean accuracy\r\n\tresult = mean(scores)\r\n\treturn result<\/pre>\n<p>Next, we need to define the <em>step()<\/em> function used to take a step in the search space.<\/p>\n<p>Each hyperparameter is quite a different range, therefore, we will define the step size (standard deviation of the distribution) separately for each hyperparameter. We will also define the step sizes in line rather than as arguments to the function, to keep things simple.<\/p>\n<p>The number of trees and the depth are integers, so the stepped values are rounded.<\/p>\n<p>The step sizes chosen are arbitrary, chosen after a little trial and error.<\/p>\n<p>The updated step function is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># take a step in the search space\r\ndef step(cfg):\r\n\t# unpack config\r\n\tlrate, n_tree, subsam, depth = cfg\r\n\t# learning rate\r\n\tlrate = lrate + randn() * 0.01\r\n\tif lrate &lt;= 0.0:\r\n\t\tlrate = 1e-8\r\n\tif lrate &gt; 1:\r\n\t\tlrate = 1.0\r\n\t# number of trees\r\n\tn_tree = round(n_tree + randn() * 50)\r\n\tif n_tree &lt;= 0.0:\r\n\t\tn_tree = 1\r\n\t# subsample percentage\r\n\tsubsam = subsam + randn() * 0.1\r\n\tif subsam &lt;= 0.0:\r\n\t\tsubsam = 1e-8\r\n\tif subsam &gt; 1:\r\n\t\tsubsam = 1.0\r\n\t# max tree depth\r\n\tdepth = round(depth + randn() * 7)\r\n\tif depth &lt;= 1:\r\n\t\tdepth = 1\r\n\t# return new config\r\n\treturn [lrate, n_tree, subsam, depth]<\/pre>\n<p>Finally, the <em>hillclimbing()<\/em> algorithm must be updated to define an initial solution with appropriate values.<\/p>\n<p>In this case, we will define the initial solution with sensible defaults, matching the default hyperparameters, or close to them.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# starting point for the search\r\nsolution = step([0.1, 100, 1.0, 7])<\/pre>\n<p>Tying this together, the complete example of manually tuning the hyperparameters of the XGBoost algorithm using a stochastic hill climbing algorithm is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># xgboost manual hyperparameter optimization for binary classification\r\nfrom numpy import mean\r\nfrom numpy.random import randn\r\nfrom numpy.random import rand\r\nfrom numpy.random import randint\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom xgboost import XGBClassifier\r\n\r\n# objective function\r\ndef objective(X, y, cfg):\r\n\t# unpack config\r\n\tlrate, n_tree, subsam, depth = cfg\r\n\t# define model\r\n\tmodel = XGBClassifier(learning_rate=lrate, n_estimators=n_tree, subsample=subsam, max_depth=depth)\r\n\t# define evaluation procedure\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\t# evaluate model\r\n\tscores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n\t# calculate mean accuracy\r\n\tresult = mean(scores)\r\n\treturn result\r\n\r\n# take a step in the search space\r\ndef step(cfg):\r\n\t# unpack config\r\n\tlrate, n_tree, subsam, depth = cfg\r\n\t# learning rate\r\n\tlrate = lrate + randn() * 0.01\r\n\tif lrate &lt;= 0.0:\r\n\t\tlrate = 1e-8\r\n\tif lrate &gt; 1:\r\n\t\tlrate = 1.0\r\n\t# number of trees\r\n\tn_tree = round(n_tree + randn() * 50)\r\n\tif n_tree &lt;= 0.0:\r\n\t\tn_tree = 1\r\n\t# subsample percentage\r\n\tsubsam = subsam + randn() * 0.1\r\n\tif subsam &lt;= 0.0:\r\n\t\tsubsam = 1e-8\r\n\tif subsam &gt; 1:\r\n\t\tsubsam = 1.0\r\n\t# max tree depth\r\n\tdepth = round(depth + randn() * 7)\r\n\tif depth &lt;= 1:\r\n\t\tdepth = 1\r\n\t# return new config\r\n\treturn [lrate, n_tree, subsam, depth]\r\n\r\n# hill climbing local search algorithm\r\ndef hillclimbing(X, y, objective, n_iter):\r\n\t# starting point for the search\r\n\tsolution = step([0.1, 100, 1.0, 7])\r\n\t# evaluate the initial point\r\n\tsolution_eval = objective(X, y, solution)\r\n\t# run the hill climb\r\n\tfor i in range(n_iter):\r\n\t\t# take a step\r\n\t\tcandidate = step(solution)\r\n\t\t# evaluate candidate point\r\n\t\tcandidte_eval = objective(X, y, candidate)\r\n\t\t# check if we should keep the new point\r\n\t\tif candidte_eval &gt;= solution_eval:\r\n\t\t\t# store the new point\r\n\t\t\tsolution, solution_eval = candidate, candidte_eval\r\n\t\t\t# report progress\r\n\t\t\tprint('&gt;%d, cfg=[%s] %.5f' % (i, solution, solution_eval))\r\n\treturn [solution, solution_eval]\r\n\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)\r\n# define the total iterations\r\nn_iter = 200\r\n# perform the hill climbing search\r\ncfg, score = hillclimbing(X, y, objective, n_iter)\r\nprint('Done!')\r\nprint('cfg=[%s]: Mean Accuracy: %f' % (cfg, score))<\/pre>\n<p>Running the example reports the configuration and result each time an improvement is seen during the search. At the end of the run, the best configuration and result are reported.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the best result involved using a learning rate of about 0.02, 52 trees, a subsample rate of about 50 percent, and a large depth of 53 levels.<\/p>\n<p>This configuration resulted in a mean accuracy of about 87.3 percent, better than the default configuration that achieved an accuracy of about 84.9 percent.<\/p>\n<p><strong>Can you get a better result?<\/strong><br \/>\nLet me know in the comments below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">&gt;0, cfg=[[0.1058242692126418, 67, 0.9228490731610172, 12]] 0.85933\r\n&gt;1, cfg=[[0.11060813799692253, 51, 0.859353656735739, 13]] 0.86100\r\n&gt;4, cfg=[[0.11890247679234153, 58, 0.7135275461723894, 12]] 0.86167\r\n&gt;5, cfg=[[0.10226257987735601, 61, 0.6086462443373852, 17]] 0.86400\r\n&gt;15, cfg=[[0.11176962034280596, 106, 0.5592742266405146, 13]] 0.86500\r\n&gt;19, cfg=[[0.09493587069112454, 153, 0.5049124222437619, 34]] 0.86533\r\n&gt;23, cfg=[[0.08516531024154426, 88, 0.5895201311518876, 31]] 0.86733\r\n&gt;46, cfg=[[0.10092590898175327, 32, 0.5982811365027455, 30]] 0.86867\r\n&gt;75, cfg=[[0.099469211050998, 20, 0.36372573610040404, 32]] 0.86900\r\n&gt;96, cfg=[[0.09021536590375884, 38, 0.4725379807796971, 20]] 0.86900\r\n&gt;100, cfg=[[0.08979482274655906, 65, 0.3697395430835758, 14]] 0.87000\r\n&gt;110, cfg=[[0.06792737273465625, 89, 0.33827505722318224, 17]] 0.87000\r\n&gt;118, cfg=[[0.05544969684589669, 72, 0.2989721608535262, 23]] 0.87200\r\n&gt;122, cfg=[[0.050102976159097, 128, 0.2043203965148931, 24]] 0.87200\r\n&gt;123, cfg=[[0.031493266763680444, 120, 0.2998819062922256, 30]] 0.87333\r\n&gt;128, cfg=[[0.023324201169625292, 84, 0.4017169945431015, 42]] 0.87333\r\n&gt;140, cfg=[[0.020224220443108752, 52, 0.5088096815056933, 53]] 0.87367\r\nDone!\r\ncfg=[[0.020224220443108752, 52, 0.5088096815056933, 53]]: Mean Accuracy: 0.873667<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Tutorials<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/hyperparameter-optimization-with-random-search-and-grid-search\/\">Hyperparameter Optimization With Random Search and Grid Search<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/configure-gradient-boosting-algorithm\/\">How to Configure the Gradient Boosting Algorithm<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/implement-perceptron-algorithm-scratch-python\/\">How To Implement The Perceptron Algorithm From Scratch In Python<\/a><\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li>\n<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">sklearn.datasets.make_classification APIs<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.accuracy_score.html\">sklearn.metrics.accuracy_score APIs<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/numpy.org\/doc\/stable\/reference\/random\/generated\/numpy.random.rand.html\">numpy.random.rand API<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.linear_model.Perceptron.html\">sklearn.linear_model.Perceptron API<\/a>.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Perceptron\">Perceptron, Wikipedia<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">XGBoost, Wikipedia<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to manually optimize the hyperparameters of machine learning algorithms.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Stochastic optimization algorithms can be used instead of grid and random search for hyperparameter optimization.<\/li>\n<li>How to use a stochastic hill climbing algorithm to tune the hyperparameters of the Perceptron algorithm.<\/li>\n<li>How to manually optimize the hyperparameters of the XGBoost gradient boosting algorithm.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/manually-optimize-hyperparameters\/\">How to Manually Optimize Machine Learning Model Hyperparameters<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/manually-optimize-hyperparameters\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Machine learning algorithms have hyperparameters that allow the algorithms to be tailored to specific datasets. Although the impact of hyperparameters may be [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2021\/03\/25\/how-to-manually-optimize-machine-learning-model-hyperparameters\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":4512,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4511"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=4511"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4511\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/4512"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=4511"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=4511"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=4511"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}