{"id":4379,"date":"2021-02-09T18:00:52","date_gmt":"2021-02-09T18:00:52","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2021\/02\/09\/how-to-use-optimization-algorithms-to-manually-fit-regression-models\/"},"modified":"2021-02-09T18:00:52","modified_gmt":"2021-02-09T18:00:52","slug":"how-to-use-optimization-algorithms-to-manually-fit-regression-models","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2021\/02\/09\/how-to-use-optimization-algorithms-to-manually-fit-regression-models\/","title":{"rendered":"How to Use Optimization Algorithms to Manually Fit Regression Models"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Regression models are fit on training data using linear regression and local search optimization algorithms.<\/p>\n<p>Models like linear regression and logistic regression are trained by least squares optimization, and this is the most efficient approach to finding coefficients that minimize error for these models.<\/p>\n<p>Nevertheless, it is possible to use alternate <strong>optimization algorithms to fit a regression model<\/strong> to a training dataset. This can be a useful exercise to learn more about how regression functions and the central nature of optimization in applied machine learning. It may also be required for regression with data that does not meet the requirements of a least squares optimization procedure.<\/p>\n<p>In this tutorial, you will discover how to manually optimize the coefficients of regression models.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>How to develop the inference models for regression from scratch.<\/li>\n<li>How to optimize the coefficients of a linear regression model for predicting numeric values.<\/li>\n<li>How to optimize the coefficients of a logistic regression model using stochastic hill climbing.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_11948\" style=\"width: 809px\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-11948\" loading=\"lazy\" class=\"size-full wp-image-11948\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/03\/How-to-Use-Optimization-Algorithms-to-Manually-Fit-Regression-Models.jpg\" alt=\"How to Use Optimization Algorithms to Manually Fit Regression Models\" width=\"799\" height=\"533\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2021\/03\/How-to-Use-Optimization-Algorithms-to-Manually-Fit-Regression-Models.jpg 799w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2021\/03\/How-to-Use-Optimization-Algorithms-to-Manually-Fit-Regression-Models-300x200.jpg 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2021\/03\/How-to-Use-Optimization-Algorithms-to-Manually-Fit-Regression-Models-768x512.jpg 768w\" sizes=\"(max-width: 799px) 100vw, 799px\"><\/p>\n<p id=\"caption-attachment-11948\" class=\"wp-caption-text\">How to Use Optimization Algorithms to Manually Fit Regression Models<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/collins_family\/31023265312\/\">Christian Collins<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Optimize Regression Models<\/li>\n<li>Optimize a Linear Regression Model<\/li>\n<li>Optimize a Logistic Regression Model<\/li>\n<\/ol>\n<h2>Optimize Regression Models<\/h2>\n<p>Regression models, like linear regression and logistic regression, are well-understood algorithms from the field of statistics.<\/p>\n<p>Both algorithms are linear, meaning the output of the model is a weighted sum of the inputs. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Linear_regression\">Linear regression<\/a> is designed for \u201c<em>regression<\/em>\u201d problems that require a number to be predicted, and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Logistic_regression\">logistic regression<\/a> is designed for \u201c<em>classification<\/em>\u201d problems that require a class label to be predicted.<\/p>\n<p>These regression models involve the use of an optimization algorithm to find a set of coefficients for each input to the model that minimizes the prediction error. Because the models are linear and well understood, efficient optimization algorithms can be used.<\/p>\n<p>In the case of linear regression, the coefficients can be found by least squares optimization, which can be solved using <a href=\"https:\/\/machinelearningmastery.com\/solve-linear-regression-using-linear-algebra\/\">linear algebra<\/a>. In the case of logistic regression, a local search optimization algorithm is commonly used.<\/p>\n<p>It is possible to use any arbitrary optimization algorithm to train linear and logistic regression models.<\/p>\n<p>That is, we can define a regression model and use a given optimization algorithm to find a set of coefficients for the model that result in a minimum of prediction error or a maximum of classification accuracy.<\/p>\n<p>Using alternate optimization algorithms is expected to be less efficient on average than using the recommended optimization. Nevertheless, it may be more efficient in some specific cases, such as if the input data does not meet the expectations of the model like a Gaussian distribution and is uncorrelated with outer inputs.<\/p>\n<p>It can also be an interesting exercise to demonstrate the central nature of optimization in training machine learning algorithms, and specifically regression models.<\/p>\n<p>Next, let\u2019s explore how to train a linear regression model using stochastic hill climbing.<\/p>\n<h2>Optimize a Linear Regression Model<\/h2>\n<p>The <a href=\"https:\/\/machinelearningmastery.com\/implement-linear-regression-stochastic-gradient-descent-scratch-python\/\">linear regression<\/a> model might be the simplest predictive model that learns from data.<\/p>\n<p>The model has one coefficient for each input and the predicted output is simply the weights of some inputs and coefficients.<\/p>\n<p>In this section, we will optimize the coefficients of a linear regression model.<\/p>\n<p>First, let\u2019s define a synthetic regression problem that we can use as the focus of optimizing the model.<\/p>\n<p>We can use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_regression.html\">make_regression() function<\/a> to define a regression problem with 1,000 rows and 10 input variables.<\/p>\n<p>The example below creates the dataset and summarizes the shape of the data.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># define a regression dataset\r\nfrom sklearn.datasets import make_regression\r\n# define dataset\r\nX, y = make_regression(n_samples=1000, n_features=10, n_informative=2, noise=0.2, random_state=1)\r\n# summarize the shape of the dataset\r\nprint(X.shape, y.shape)<\/pre>\n<p>Running the example prints the shape of the created dataset, confirming our expectations.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">(1000, 10) (1000,)<\/pre>\n<p>Next, we need to define a linear regression model.<\/p>\n<p>Before we optimize the model coefficients, we must develop the model and our confidence in how it works.<\/p>\n<p>Let\u2019s start by developing a function that calculates the activation of the model for a given input row of data from the dataset.<\/p>\n<p>This function will take the row of data and the coefficients for the model and calculate the weighted sum of the input with the addition of an extra y-intercept (also called the offset or bias) coefficient. The <em>predict_row()<\/em> function below implements this.<\/p>\n<p>We are using simple Python lists and imperative programming style instead of <a href=\"https:\/\/machinelearningmastery.com\/gentle-introduction-n-dimensional-arrays-python-numpy\/\">NumPy arrays<\/a> or list compressions intentionally to make the code more readable for Python beginners. Feel free to optimize it and post your code in the comments below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># linear regression\r\ndef predict_row(row, coefficients):\r\n\t# add the bias, the last coefficient\r\n\tresult = coefficients[-1]\r\n\t# add the weighted input\r\n\tfor i in range(len(row)):\r\n\t\tresult += coefficients[i] * row[i]\r\n\treturn result<\/pre>\n<p>Next, we can call the <em>predict_row()<\/em> function for each row in a given dataset. The <em>predict_dataset()<\/em> function below implements this.<\/p>\n<p>Again, we are intentionally using a simple imperative coding style for readability instead of list compressions.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># use model coefficients to generate predictions for a dataset of rows\r\ndef predict_dataset(X, coefficients):\r\n\tyhats = list()\r\n\tfor row in X:\r\n\t\t# make a prediction\r\n\t\tyhat = predict_row(row, coefficients)\r\n\t\t# store the prediction\r\n\t\tyhats.append(yhat)\r\n\treturn yhats<\/pre>\n<p>Finally, we can use the model to make predictions on our synthetic dataset to confirm it is all working correctly.<\/p>\n<p>We can generate a random set of model coefficients using the <a href=\"https:\/\/numpy.org\/doc\/stable\/reference\/random\/generated\/numpy.random.rand.html\">rand() function<\/a>.<\/p>\n<p>Recall that we need one coefficient for each input (ten inputs in this dataset) plus an extra weight for the y-intercept coefficient.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define dataset\r\nX, y = make_regression(n_samples=1000, n_features=10, n_informative=2, noise=0.2, random_state=1)\r\n# determine the number of coefficients\r\nn_coeff = X.shape[1] + 1\r\n# generate random coefficients\r\ncoefficients = rand(n_coeff)<\/pre>\n<p>We can then use these coefficients with the dataset to make predictions.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# generate predictions for dataset\r\nyhat = predict_dataset(X, coefficients)<\/pre>\n<p>We can evaluate the mean squared error of these predictions.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# calculate model prediction error\r\nscore = mean_squared_error(y, yhat)\r\nprint('MSE: %f' % score)<\/pre>\n<p>That\u2019s it.<\/p>\n<p>We can tie all of this together and demonstrate our linear regression model for regression predictive modeling. The complete example is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># linear regression model\r\nfrom numpy.random import rand\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.metrics import mean_squared_error\r\n\r\n# linear regression\r\ndef predict_row(row, coefficients):\r\n\t# add the bias, the last coefficient\r\n\tresult = coefficients[-1]\r\n\t# add the weighted input\r\n\tfor i in range(len(row)):\r\n\t\tresult += coefficients[i] * row[i]\r\n\treturn result\r\n\r\n# use model coefficients to generate predictions for a dataset of rows\r\ndef predict_dataset(X, coefficients):\r\n\tyhats = list()\r\n\tfor row in X:\r\n\t\t# make a prediction\r\n\t\tyhat = predict_row(row, coefficients)\r\n\t\t# store the prediction\r\n\t\tyhats.append(yhat)\r\n\treturn yhats\r\n\r\n# define dataset\r\nX, y = make_regression(n_samples=1000, n_features=10, n_informative=2, noise=0.2, random_state=1)\r\n# determine the number of coefficients\r\nn_coeff = X.shape[1] + 1\r\n# generate random coefficients\r\ncoefficients = rand(n_coeff)\r\n# generate predictions for dataset\r\nyhat = predict_dataset(X, coefficients)\r\n# calculate model prediction error\r\nscore = mean_squared_error(y, yhat)\r\nprint('MSE: %f' % score)<\/pre>\n<p>Running the example generates a prediction for each example in the training dataset, then prints the mean squared error for the predictions.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>We would expect a large error given a set of random weights, and that is what we see in this case, with an error value of about 7,307 units.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">MSE: 7307.756740<\/pre>\n<p>We can now optimize the coefficients of the dataset to achieve low error on this dataset.<\/p>\n<p>First, we need to split the dataset into train and test sets. It is important to hold back some data not used in optimizing the model so that we can prepare a reasonable estimate of the performance of the model when used to make predictions on new data.<\/p>\n<p>We will use 67 percent of the data for training and the remaining 33 percent as a test set for evaluating the performance of the model.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# split into train test sets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)<\/pre>\n<p>Next, we can develop a <a href=\"https:\/\/machinelearningmastery.com\/stochastic-hill-climbing-in-python-from-scratch\/\">stochastic hill climbing algorithm<\/a>.<\/p>\n<p>The optimization algorithm requires an objective function to optimize. It must take a set of coefficients and return a score that is to be minimized or maximized corresponding to a better model.<\/p>\n<p>In this case, we will evaluate the mean squared error of the model with a given set of coefficients and return the error score, which must be minimized.<\/p>\n<p>The <em>objective()<\/em> function below implements this, given the dataset and a set of coefficients, and returns the error of the model.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># objective function\r\ndef objective(X, y, coefficients):\r\n\t# generate predictions for dataset\r\n\tyhat = predict_dataset(X, coefficients)\r\n\t# calculate accuracy\r\n\tscore = mean_squared_error(y, yhat)\r\n\treturn score<\/pre>\n<p>Next, we can define the stochastic hill climbing algorithm.<\/p>\n<p>The algorithm will require an initial solution (e.g. random coefficients) and will iteratively keep making small changes to the solution and checking if it results in a better performing model. The amount of change made to the current solution is controlled by a step_size hyperparameter. This process will continue for a fixed number of iterations, also provided as a hyperparameter.<\/p>\n<p>The <em>hillclimbing()<\/em> function below implements this, taking the dataset, objective function, initial solution, and hyperparameters as arguments and returns the best set of coefficients found and the estimated performance.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># hill climbing local search algorithm\r\ndef hillclimbing(X, y, objective, solution, n_iter, step_size):\r\n\t# evaluate the initial point\r\n\tsolution_eval = objective(X, y, solution)\r\n\t# run the hill climb\r\n\tfor i in range(n_iter):\r\n\t\t# take a step\r\n\t\tcandidate = solution + randn(len(solution)) * step_size\r\n\t\t# evaluate candidate point\r\n\t\tcandidte_eval = objective(X, y, candidate)\r\n\t\t# check if we should keep the new point\r\n\t\tif candidte_eval &lt;= solution_eval:\r\n\t\t\t# store the new point\r\n\t\t\tsolution, solution_eval = candidate, candidte_eval\r\n\t\t\t# report progress\r\n\t\t\tprint('&gt;%d %.5f' % (i, solution_eval))\r\n\treturn [solution, solution_eval]<\/pre>\n<p>We can then call this function, passing in an initial set of coefficients as the initial solution and the training dataset as the dataset to optimize the model against.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define the total iterations\r\nn_iter = 2000\r\n# define the maximum step size\r\nstep_size = 0.15\r\n# determine the number of coefficients\r\nn_coef = X.shape[1] + 1\r\n# define the initial solution\r\nsolution = rand(n_coef)\r\n# perform the hill climbing search\r\ncoefficients, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)\r\nprint('Done!')\r\nprint('Coefficients: %s' % coefficients)\r\nprint('Train MSE: %f' % (score))<\/pre>\n<p>Finally, we can evaluate the best model on the test dataset and report the performance.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# generate predictions for the test dataset\r\nyhat = predict_dataset(X_test, coefficients)\r\n# calculate accuracy\r\nscore = mean_squared_error(y_test, yhat)\r\nprint('Test MSE: %f' % (score))<\/pre>\n<p>Tying this together, the complete example of optimizing the coefficients of a linear regression model on the synthetic regression dataset is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># optimize linear regression coefficients for regression dataset\r\nfrom numpy.random import randn\r\nfrom numpy.random import rand\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.metrics import mean_squared_error\r\n\r\n# linear regression\r\ndef predict_row(row, coefficients):\r\n\t# add the bias, the last coefficient\r\n\tresult = coefficients[-1]\r\n\t# add the weighted input\r\n\tfor i in range(len(row)):\r\n\t\tresult += coefficients[i] * row[i]\r\n\treturn result\r\n\r\n# use model coefficients to generate predictions for a dataset of rows\r\ndef predict_dataset(X, coefficients):\r\n\tyhats = list()\r\n\tfor row in X:\r\n\t\t# make a prediction\r\n\t\tyhat = predict_row(row, coefficients)\r\n\t\t# store the prediction\r\n\t\tyhats.append(yhat)\r\n\treturn yhats\r\n\r\n# objective function\r\ndef objective(X, y, coefficients):\r\n\t# generate predictions for dataset\r\n\tyhat = predict_dataset(X, coefficients)\r\n\t# calculate accuracy\r\n\tscore = mean_squared_error(y, yhat)\r\n\treturn score\r\n\r\n# hill climbing local search algorithm\r\ndef hillclimbing(X, y, objective, solution, n_iter, step_size):\r\n\t# evaluate the initial point\r\n\tsolution_eval = objective(X, y, solution)\r\n\t# run the hill climb\r\n\tfor i in range(n_iter):\r\n\t\t# take a step\r\n\t\tcandidate = solution + randn(len(solution)) * step_size\r\n\t\t# evaluate candidate point\r\n\t\tcandidte_eval = objective(X, y, candidate)\r\n\t\t# check if we should keep the new point\r\n\t\tif candidte_eval &lt;= solution_eval:\r\n\t\t\t# store the new point\r\n\t\t\tsolution, solution_eval = candidate, candidte_eval\r\n\t\t\t# report progress\r\n\t\t\tprint('&gt;%d %.5f' % (i, solution_eval))\r\n\treturn [solution, solution_eval]\r\n\r\n# define dataset\r\nX, y = make_regression(n_samples=1000, n_features=10, n_informative=2, noise=0.2, random_state=1)\r\n# split into train test sets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)\r\n# define the total iterations\r\nn_iter = 2000\r\n# define the maximum step size\r\nstep_size = 0.15\r\n# determine the number of coefficients\r\nn_coef = X.shape[1] + 1\r\n# define the initial solution\r\nsolution = rand(n_coef)\r\n# perform the hill climbing search\r\ncoefficients, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)\r\nprint('Done!')\r\nprint('Coefficients: %s' % coefficients)\r\nprint('Train MSE: %f' % (score))\r\n# generate predictions for the test dataset\r\nyhat = predict_dataset(X_test, coefficients)\r\n# calculate accuracy\r\nscore = mean_squared_error(y_test, yhat)\r\nprint('Test MSE: %f' % (score))<\/pre>\n<p>Running the example will report the iteration number and mean squared error each time there is an improvement made to the model.<\/p>\n<p>At the end of the search, the performance of the best set of coefficients on the training dataset is reported and the performance of the same model on the test dataset is calculated and reported.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the optimization algorithm found a set of coefficients that achieved an error of about 0.08 on both the train and test datasets.<\/p>\n<p>The fact that the algorithm found a model with very similar performance on train and test datasets is a good sign, showing that the model did not over-fit (over-optimize) to the training dataset. This means the model generalizes well to new data.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n&gt;1546 0.35426\r\n&gt;1567 0.32863\r\n&gt;1572 0.32322\r\n&gt;1619 0.24890\r\n&gt;1665 0.24800\r\n&gt;1691 0.24162\r\n&gt;1715 0.15893\r\n&gt;1809 0.15337\r\n&gt;1892 0.14656\r\n&gt;1956 0.08042\r\nDone!\r\nCoefficients: [ 1.30559829e-02 -2.58299382e-04  3.33118191e+00  3.20418534e-02\r\n  1.36497902e-01  8.65445367e+01  2.78356715e-02 -8.50901499e-02\r\n  8.90078243e-02  6.15779867e-02 -3.85657793e-02]\r\nTrain MSE: 0.080415\r\nTest MSE: 0.080779<\/pre>\n<p>Now that we are familiar with how to manually optimize the coefficients of a linear regression model, let\u2019s look at how we can extend the example to optimize the coefficients of a logistic regression model for classification.<\/p>\n<h2>Optimize a Logistic Regression Model<\/h2>\n<p>A Logistic Regression model is an extension of linear regression for classification predictive modeling.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/logistic-regression-with-maximum-likelihood-estimation\/\">Logistic regression<\/a> is for binary classification tasks, meaning datasets that have two class labels, class=0 and class=1.<\/p>\n<p>The output first involves calculating the weighted sum of the inputs, then passing this weighted sum through a logistic function, also called a sigmoid function. The result is a <a href=\"https:\/\/machinelearningmastery.com\/discrete-probability-distributions-for-machine-learning\/\">Binomial probability<\/a> between 0 and 1 for the example belonging to class=1.<\/p>\n<p>In this section, we will build on what we learned in the previous section to optimize the coefficients of regression models for classification. We will develop the model and test it with random coefficients, then use stochastic hill climbing to optimize the model coefficients.<\/p>\n<p>First, let\u2019s define a synthetic binary classification problem that we can use as the focus of optimizing the model.<\/p>\n<p>We can use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">make_classification() function<\/a> to define a binary classification problem with 1,000 rows and five input variables.<\/p>\n<p>The example below creates the dataset and summarizes the shape of the data.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># define a binary classification dataset\r\nfrom sklearn.datasets import make_classification\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)\r\n# summarize the shape of the dataset\r\nprint(X.shape, y.shape)<\/pre>\n<p>Running the example prints the shape of the created dataset, confirming our expectations.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">(1000, 5) (1000,)<\/pre>\n<p>Next, we need to define a logistic regression model.<\/p>\n<p>Let\u2019s start by updating the <em>predict_row()<\/em> function to pass the weighted sum of the input and coefficients through a logistic function.<\/p>\n<p>The logistic function is defined as:<\/p>\n<ul>\n<li>logistic = 1.0 \/ (1.0 + exp(-result))<\/li>\n<\/ul>\n<p>Where result is the weighted sum of the inputs and the coefficients and exp()\u00a0is <em>e<\/em> (<a href=\"https:\/\/en.wikipedia.org\/wiki\/E_(mathematical_constant)\">Euler\u2019s number<\/a>) raised to the power of the provided value, implemented via the <a href=\"https:\/\/docs.python.org\/3\/library\/math.html#math.exp\">exp() function<\/a>.<\/p>\n<p>The updated <em>predict_row()<\/em> function is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># logistic regression\r\ndef predict_row(row, coefficients):\r\n\t# add the bias, the last coefficient\r\n\tresult = coefficients[-1]\r\n\t# add the weighted input\r\n\tfor i in range(len(row)):\r\n\t\tresult += coefficients[i] * row[i]\r\n\t# logistic function\r\n\tlogistic = 1.0 \/ (1.0 + exp(-result))\r\n\treturn logistic<\/pre>\n<p>That\u2019s about it in terms of changes for linear regression to logistic regression.<\/p>\n<p>As with linear regression, we can test the model with a set of random model coefficients.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# determine the number of coefficients\r\nn_coeff = X.shape[1] + 1\r\n# generate random coefficients\r\ncoefficients = rand(n_coeff)\r\n# generate predictions for dataset\r\nyhat = predict_dataset(X, coefficients)<\/pre>\n<p>The predictions made by the model are probabilities for an example belonging to class=1.<\/p>\n<p>We can round the prediction to be integer values 0 and 1 for the expected class labels.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# round predictions to labels\r\nyhat = [round(y) for y in yhat]<\/pre>\n<p>We can evaluate the classification accuracy of these predictions.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# calculate accuracy\r\nscore = accuracy_score(y, yhat)\r\nprint('Accuracy: %f' % score)<\/pre>\n<p>That\u2019s it.<\/p>\n<p>We can tie all of this together and demonstrate our simple logistic regression model for binary classification. The complete example is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># logistic regression function for binary classification\r\nfrom math import exp\r\nfrom numpy.random import rand\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.metrics import accuracy_score\r\n\r\n# logistic regression\r\ndef predict_row(row, coefficients):\r\n\t# add the bias, the last coefficient\r\n\tresult = coefficients[-1]\r\n\t# add the weighted input\r\n\tfor i in range(len(row)):\r\n\t\tresult += coefficients[i] * row[i]\r\n\t# logistic function\r\n\tlogistic = 1.0 \/ (1.0 + exp(-result))\r\n\treturn logistic\r\n\r\n# use model coefficients to generate predictions for a dataset of rows\r\ndef predict_dataset(X, coefficients):\r\n\tyhats = list()\r\n\tfor row in X:\r\n\t\t# make a prediction\r\n\t\tyhat = predict_row(row, coefficients)\r\n\t\t# store the prediction\r\n\t\tyhats.append(yhat)\r\n\treturn yhats\r\n\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)\r\n# determine the number of coefficients\r\nn_coeff = X.shape[1] + 1\r\n# generate random coefficients\r\ncoefficients = rand(n_coeff)\r\n# generate predictions for dataset\r\nyhat = predict_dataset(X, coefficients)\r\n# round predictions to labels\r\nyhat = [round(y) for y in yhat]\r\n# calculate accuracy\r\nscore = accuracy_score(y, yhat)\r\nprint('Accuracy: %f' % score)<\/pre>\n<p>Running the example generates a prediction for each example in the training dataset then prints the classification accuracy for the predictions.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>We would expect about 50 percent accuracy given a set of random weights and a dataset with an equal number of examples in each class, and that is approximately what we see in this case.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Accuracy: 0.540000<\/pre>\n<p>We can now optimize the weights of the dataset to achieve good accuracy on this dataset.<\/p>\n<p>The stochastic hill climbing algorithm used for linear regression can be used again for logistic regression.<\/p>\n<p>The important difference is an update to the <em>objective()<\/em> function to round the predictions and evaluate the model using classification accuracy instead of mean squared error.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># objective function\r\ndef objective(X, y, coefficients):\r\n\t# generate predictions for dataset\r\n\tyhat = predict_dataset(X, coefficients)\r\n\t# round predictions to labels\r\n\tyhat = [round(y) for y in yhat]\r\n\t# calculate accuracy\r\n\tscore = accuracy_score(y, yhat)\r\n\treturn score<\/pre>\n<p>The <em>hillclimbing()<\/em> function also must be updated to maximize the score of solutions instead of minimizing in the case of linear regression.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># hill climbing local search algorithm\r\ndef hillclimbing(X, y, objective, solution, n_iter, step_size):\r\n\t# evaluate the initial point\r\n\tsolution_eval = objective(X, y, solution)\r\n\t# run the hill climb\r\n\tfor i in range(n_iter):\r\n\t\t# take a step\r\n\t\tcandidate = solution + randn(len(solution)) * step_size\r\n\t\t# evaluate candidate point\r\n\t\tcandidte_eval = objective(X, y, candidate)\r\n\t\t# check if we should keep the new point\r\n\t\tif candidte_eval &gt;= solution_eval:\r\n\t\t\t# store the new point\r\n\t\t\tsolution, solution_eval = candidate, candidte_eval\r\n\t\t\t# report progress\r\n\t\t\tprint('&gt;%d %.5f' % (i, solution_eval))\r\n\treturn [solution, solution_eval]<\/pre>\n<p>Finally, the coefficients found by the search can be evaluated using classification accuracy at the end of the run.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# generate predictions for the test dataset\r\nyhat = predict_dataset(X_test, coefficients)\r\n# round predictions to labels\r\nyhat = [round(y) for y in yhat]\r\n# calculate accuracy\r\nscore = accuracy_score(y_test, yhat)\r\nprint('Test Accuracy: %f' % (score))<\/pre>\n<p>Tying this all together, the complete example of using stochastic hill climbing to maximize classification accuracy of a logistic regression model is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># optimize logistic regression model with a stochastic hill climber\r\nfrom math import exp\r\nfrom numpy.random import randn\r\nfrom numpy.random import rand\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.metrics import accuracy_score\r\n\r\n# logistic regression\r\ndef predict_row(row, coefficients):\r\n\t# add the bias, the last coefficient\r\n\tresult = coefficients[-1]\r\n\t# add the weighted input\r\n\tfor i in range(len(row)):\r\n\t\tresult += coefficients[i] * row[i]\r\n\t# logistic function\r\n\tlogistic = 1.0 \/ (1.0 + exp(-result))\r\n\treturn logistic\r\n\r\n# use model coefficients to generate predictions for a dataset of rows\r\ndef predict_dataset(X, coefficients):\r\n\tyhats = list()\r\n\tfor row in X:\r\n\t\t# make a prediction\r\n\t\tyhat = predict_row(row, coefficients)\r\n\t\t# store the prediction\r\n\t\tyhats.append(yhat)\r\n\treturn yhats\r\n\r\n# objective function\r\ndef objective(X, y, coefficients):\r\n\t# generate predictions for dataset\r\n\tyhat = predict_dataset(X, coefficients)\r\n\t# round predictions to labels\r\n\tyhat = [round(y) for y in yhat]\r\n\t# calculate accuracy\r\n\tscore = accuracy_score(y, yhat)\r\n\treturn score\r\n\r\n# hill climbing local search algorithm\r\ndef hillclimbing(X, y, objective, solution, n_iter, step_size):\r\n\t# evaluate the initial point\r\n\tsolution_eval = objective(X, y, solution)\r\n\t# run the hill climb\r\n\tfor i in range(n_iter):\r\n\t\t# take a step\r\n\t\tcandidate = solution + randn(len(solution)) * step_size\r\n\t\t# evaluate candidate point\r\n\t\tcandidte_eval = objective(X, y, candidate)\r\n\t\t# check if we should keep the new point\r\n\t\tif candidte_eval &gt;= solution_eval:\r\n\t\t\t# store the new point\r\n\t\t\tsolution, solution_eval = candidate, candidte_eval\r\n\t\t\t# report progress\r\n\t\t\tprint('&gt;%d %.5f' % (i, solution_eval))\r\n\treturn [solution, solution_eval]\r\n\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)\r\n# split into train test sets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)\r\n# define the total iterations\r\nn_iter = 2000\r\n# define the maximum step size\r\nstep_size = 0.1\r\n# determine the number of coefficients\r\nn_coef = X.shape[1] + 1\r\n# define the initial solution\r\nsolution = rand(n_coef)\r\n# perform the hill climbing search\r\ncoefficients, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)\r\nprint('Done!')\r\nprint('Coefficients: %s' % coefficients)\r\nprint('Train Accuracy: %f' % (score))\r\n# generate predictions for the test dataset\r\nyhat = predict_dataset(X_test, coefficients)\r\n# round predictions to labels\r\nyhat = [round(y) for y in yhat]\r\n# calculate accuracy\r\nscore = accuracy_score(y_test, yhat)\r\nprint('Test Accuracy: %f' % (score))<\/pre>\n<p>Running the example will report the iteration number and classification accuracy each time there is an improvement made to the model.<\/p>\n<p>At the end of the search, the performance of the best set of coefficients on the training dataset is reported and the performance of the same model on the test dataset is calculated and reported.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the optimization algorithm found a set of weights that achieved about 87.3 percent accuracy on the training dataset and about 83.9 percent accuracy on the test dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n&gt;200 0.85672\r\n&gt;225 0.85672\r\n&gt;230 0.85672\r\n&gt;245 0.86418\r\n&gt;281 0.86418\r\n&gt;285 0.86716\r\n&gt;294 0.86716\r\n&gt;306 0.86716\r\n&gt;316 0.86716\r\n&gt;317 0.86716\r\n&gt;320 0.86866\r\n&gt;348 0.86866\r\n&gt;362 0.87313\r\n&gt;784 0.87313\r\n&gt;1649 0.87313\r\nDone!\r\nCoefficients: [-0.04652756  0.23243427  2.58587637 -0.45528253 -0.4954355  -0.42658053]\r\nTrain Accuracy: 0.873134\r\nTest Accuracy: 0.839394<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/train-test-split-for-evaluating-machine-learning-algorithms\/\">Train-Test Split for Evaluating Machine Learning Algorithms<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/implement-linear-regression-stochastic-gradient-descent-scratch-python\/\">How to Implement Linear Regression From Scratch in Python<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/implement-logistic-regression-stochastic-gradient-descent-scratch-python\/\">How To Implement Logistic Regression From Scratch in Python<\/a><\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li>\n<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_regression.html\">sklearn.datasets.make_regression APIs<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">sklearn.datasets.make_classification APIs<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.mean_squared_error.html\">sklearn.metrics.mean_squared_error APIs<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/numpy.org\/doc\/stable\/reference\/random\/generated\/numpy.random.rand.html\">numpy.random.rand API<\/a>.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Linear_regression\">Linear regression, Wikipedia<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Logistic_regression\">Logistic regression, Wikipedia<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to manually optimize the coefficients of regression models.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>How to develop the inference models for regression from scratch.<\/li>\n<li>How to optimize the coefficients of a linear regression model for predicting numeric values.<\/li>\n<li>How to optimize the coefficients of a logistic regression model using stochastic hill climbing.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/optimize-regression-models\/\">How to Use Optimization Algorithms to Manually Fit Regression Models<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/optimize-regression-models\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Regression models are fit on training data using linear regression and local search optimization algorithms. Models like linear regression and logistic regression [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2021\/02\/09\/how-to-use-optimization-algorithms-to-manually-fit-regression-models\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":4380,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4379"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=4379"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4379\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/4380"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=4379"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=4379"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=4379"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}