{"id":4153,"date":"2020-12-03T18:00:15","date_gmt":"2020-12-03T18:00:15","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2020\/12\/03\/how-to-manually-optimize-neural-network-models\/"},"modified":"2020-12-03T18:00:15","modified_gmt":"2020-12-03T18:00:15","slug":"how-to-manually-optimize-neural-network-models","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2020\/12\/03\/how-to-manually-optimize-neural-network-models\/","title":{"rendered":"How to Manually Optimize Neural Network Models"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p><strong>Deep learning neural network<\/strong> models are fit on training data using the stochastic gradient descent optimization algorithm.<\/p>\n<p>Updates to the weights of the model are made, using the backpropagation of error algorithm. The combination of the optimization and weight update algorithm was carefully chosen and is the most efficient approach known to fit neural networks.<\/p>\n<p>Nevertheless, it is possible to use alternate optimization algorithms to fit a neural network model to a training dataset. This can be a useful exercise to learn more about how neural networks function and the central nature of optimization in applied machine learning. It may also be required for neural networks with unconventional model architectures and non-differentiable transfer functions.<\/p>\n<p>In this tutorial, you will discover how to manually optimize the weights of neural network models.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>How to develop the forward inference pass for neural network models from scratch.<\/li>\n<li>How to optimize the weights of a Perceptron model for binary classification.<\/li>\n<li>How to optimize the weights of a Multilayer Perceptron model using stochastic hill climbing.<\/li>\n<\/ul>\n<p>Let&rsquo;s get started.<\/p>\n<div id=\"attachment_11942\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img decoding=\"async\" aria-describedby=\"caption-attachment-11942\" loading=\"lazy\" class=\"size-full wp-image-11942\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/03\/How-to-Manually-Optimize-Neural-Network-Models.jpg\" alt=\"How to Manually Optimize Neural Network Models\" width=\"800\" height=\"532\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2021\/03\/How-to-Manually-Optimize-Neural-Network-Models.jpg 800w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2021\/03\/How-to-Manually-Optimize-Neural-Network-Models-300x200.jpg 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2021\/03\/How-to-Manually-Optimize-Neural-Network-Models-768x511.jpg 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"><\/p>\n<p id=\"caption-attachment-11942\" class=\"wp-caption-text\">How to Manually Optimize Neural Network Models<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/mypubliclands\/26153922644\/\">Bureau of Land Management<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Optimize Neural Networks<\/li>\n<li>Optimize a Perceptron Model<\/li>\n<li>Optimize a Multilayer Perceptron<\/li>\n<\/ol>\n<h2>Optimize Neural Networks<\/h2>\n<p><a href=\"https:\/\/machinelearningmastery.com\/what-is-deep-learning\/\">Deep learning<\/a> or neural networks are a flexible type of machine learning.<\/p>\n<p>They are models composed of nodes and layers inspired by the structure and function of the brain. A neural network model works by propagating a given input vector through one or more layers to produce a numeric output that can be interpreted for classification or regression predictive modeling.<\/p>\n<p>Models are trained by repeatedly exposing the model to examples of input and output and adjusting the weights to minimize the error of the model&rsquo;s output compared to the expected output. This is called the stochastic gradient descent optimization algorithm. The weights of the model are adjusted using a specific rule from calculus that assigns error proportionally to each weight in the network. This is called the <a href=\"https:\/\/machinelearningmastery.com\/implement-backpropagation-algorithm-scratch-python\/\">backpropagation algorithm<\/a>.<\/p>\n<p>The stochastic gradient descent optimization algorithm with weight updates made using backpropagation is the best way to train neural network models. However, it is not the only way to train a neural network.<\/p>\n<p>It is possible to use any arbitrary optimization algorithm to train a neural network model.<\/p>\n<p>That is, we can define a neural network model architecture and use a given optimization algorithm to find a set of weights for the model that results in a minimum of prediction error or a maximum of classification accuracy.<\/p>\n<p>Using alternate optimization algorithms is expected to be less efficient on average than using stochastic gradient descent with backpropagation. Nevertheless, it may be more efficient in some specific cases, such as non-standard network architectures or non-differential transfer functions.<\/p>\n<p>It can also be an interesting exercise to demonstrate the central nature of optimization in training machine learning algorithms, and specifically neural networks.<\/p>\n<p>Next, let&rsquo;s explore how to train a simple one-node neural network called a Perceptron model using stochastic hill climbing.<\/p>\n<h2>Optimize a Perceptron Model<\/h2>\n<p>The <a href=\"https:\/\/machinelearningmastery.com\/implement-perceptron-algorithm-scratch-python\/\">Perceptron algorithm<\/a> is the simplest type of artificial neural network.<\/p>\n<p>It is a model of a single neuron that can be used for two-class classification problems and provides the foundation for later developing much larger networks.<\/p>\n<p>In this section, we will optimize the weights of a Perceptron neural network model.<\/p>\n<p>First, let&rsquo;s define a synthetic binary classification problem that we can use as the focus of optimizing the model.<\/p>\n<p>We can use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">make_classification() function<\/a> to define a binary classification problem with 1,000 rows and five input variables.<\/p>\n<p>The example below creates the dataset and summarizes the shape of the data.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># define a binary classification dataset\r\nfrom sklearn.datasets import make_classification\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)\r\n# summarize the shape of the dataset\r\nprint(X.shape, y.shape)<\/pre>\n<p>Running the example prints the shape of the created dataset, confirming our expectations.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">(1000, 5) (1000,)<\/pre>\n<p>Next, we need to define a Perceptron model.<\/p>\n<p>The Perceptron model has a single node that has one input weight for each column in the dataset.<\/p>\n<p>Each input is multiplied by its corresponding weight to give a weighted sum and a bias weight is then added, like an intercept coefficient in a regression model. This weighted sum is called the activation. Finally, the activation is interpreted and used to predict the class label, 1 for a positive activation and 0 for a negative activation.<\/p>\n<p>Before we optimize the model weights, we must develop the model and our confidence in how it works.<\/p>\n<p>Let&rsquo;s start by defining a function for interpreting the activation of the model.<\/p>\n<p>This is called the activation function, or the transfer function; the latter name is more traditional and is my preference.<\/p>\n<p>The <em>transfer()<\/em> function below takes the activation of the model and returns a class label, class=1 for a positive or zero activation and class=0 for a negative activation. This is called a step transfer function.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># transfer function\r\ndef transfer(activation):\r\n\tif activation &gt;= 0.0:\r\n\t\treturn 1\r\n\treturn 0<\/pre>\n<p>Next, we can develop a function that calculates the activation of the model for a given input row of data from the dataset.<\/p>\n<p>This function will take the row of data and the weights for the model and calculate the weighted sum of the input with the addition of the bias weight. The <em>activate()<\/em> function below implements this.<\/p>\n<p><strong>Note<\/strong>: We are using simple Python lists and imperative programming style instead of NumPy arrays or list compressions intentionally to make the code more readable for Python beginners. Feel free to optimize it and post your code in the comments below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># activation function\r\ndef activate(row, weights):\r\n\t# add the bias, the last weight\r\n\tactivation = weights[-1]\r\n\t# add the weighted input\r\n\tfor i in range(len(row)):\r\n\t\tactivation += weights[i] * row[i]\r\n\treturn activation<\/pre>\n<p>Next, we can use the <em>activate()<\/em> and <em>transfer()<\/em> functions together to generate a prediction for a given row of data. The <em>predict_row()<\/em> function below implements this.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># use model weights to predict 0 or 1 for a given row of data\r\ndef predict_row(row, weights):\r\n\t# activate for input\r\n\tactivation = activate(row, weights)\r\n\t# transfer for activation\r\n\treturn transfer(activation)<\/pre>\n<p>Next, we can call the <em>predict_row()<\/em> function for each row in a given dataset. The <em>predict_dataset()<\/em> function below implements this.<\/p>\n<p>Again, we are intentionally using simple imperative coding style for readability instead of list compressions.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># use model weights to generate predictions for a dataset of rows\r\ndef predict_dataset(X, weights):\r\n\tyhats = list()\r\n\tfor row in X:\r\n\t\tyhat = predict_row(row, weights)\r\n\t\tyhats.append(yhat)\r\n\treturn yhats<\/pre>\n<p>Finally, we can use the model to make predictions on our synthetic dataset to confirm it is all working correctly.<\/p>\n<p>We can generate a random set of model weights using the <a href=\"https:\/\/numpy.org\/doc\/stable\/reference\/random\/generated\/numpy.random.rand.html\">rand() function<\/a>.<\/p>\n<p>Recall that we need one weight for each input (five inputs in this dataset) plus an extra weight for the bias weight.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)\r\n# determine the number of weights\r\nn_weights = X.shape[1] + 1\r\n# generate random weights\r\nweights = rand(n_weights)<\/pre>\n<p>We can then use these weights with the dataset to make predictions.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# generate predictions for dataset\r\nyhat = predict_dataset(X, weights)<\/pre>\n<p>We can evaluate the classification accuracy of these predictions.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# calculate accuracy\r\nscore = accuracy_score(y, yhat)\r\nprint(score)<\/pre>\n<p>That&rsquo;s it.<\/p>\n<p>We can tie all of this together and demonstrate our simple Perceptron model for classification. The complete example is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># simple perceptron model for binary classification\r\nfrom numpy.random import rand\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.metrics import accuracy_score\r\n\r\n# transfer function\r\ndef transfer(activation):\r\n\tif activation &gt;= 0.0:\r\n\t\treturn 1\r\n\treturn 0\r\n\r\n# activation function\r\ndef activate(row, weights):\r\n\t# add the bias, the last weight\r\n\tactivation = weights[-1]\r\n\t# add the weighted input\r\n\tfor i in range(len(row)):\r\n\t\tactivation += weights[i] * row[i]\r\n\treturn activation\r\n\r\n# use model weights to predict 0 or 1 for a given row of data\r\ndef predict_row(row, weights):\r\n\t# activate for input\r\n\tactivation = activate(row, weights)\r\n\t# transfer for activation\r\n\treturn transfer(activation)\r\n\r\n# use model weights to generate predictions for a dataset of rows\r\ndef predict_dataset(X, weights):\r\n\tyhats = list()\r\n\tfor row in X:\r\n\t\tyhat = predict_row(row, weights)\r\n\t\tyhats.append(yhat)\r\n\treturn yhats\r\n\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)\r\n# determine the number of weights\r\nn_weights = X.shape[1] + 1\r\n# generate random weights\r\nweights = rand(n_weights)\r\n# generate predictions for dataset\r\nyhat = predict_dataset(X, weights)\r\n# calculate accuracy\r\nscore = accuracy_score(y, yhat)\r\nprint(score)<\/pre>\n<p>Running the example generates a prediction for each example in the training dataset then prints the classification accuracy for the predictions.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>We would expect about 50 percent accuracy given a set of random weights and a dataset with an equal number of examples in each class, and that is approximately what we see in this case.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">0.548<\/pre>\n<p>We can now optimize the weights of the dataset to achieve good accuracy on this dataset.<\/p>\n<p>First, we need to split the dataset into <a href=\"https:\/\/machinelearningmastery.com\/train-test-split-for-evaluating-machine-learning-algorithms\/\">train and test sets<\/a>. It is important to hold back some data not used in optimizing the model so that we can prepare a reasonable estimate of the performance of the model when used to make predictions on new data.<\/p>\n<p>We will use 67 percent of the data for training and the remaining 33 percent as a test set for evaluating the performance of the model.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# split into train test sets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)<\/pre>\n<p>Next, we can develop a stochastic hill climbing algorithm.<\/p>\n<p>The optimization algorithm requires an objective function to optimize. It must take a set of weights and return a score that is to be minimized or maximized corresponding to a better model.<\/p>\n<p>In this case, we will evaluate the accuracy of the model with a given set of weights and return the classification accuracy, which must be maximized.<\/p>\n<p>The <em>objective()<\/em> function below implements this, given the dataset and a set of weights, and returns the accuracy of the model<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># objective function\r\ndef objective(X, y, weights):\r\n\t# generate predictions for dataset\r\n\tyhat = predict_dataset(X, weights)\r\n\t# calculate accuracy\r\n\tscore = accuracy_score(y, yhat)\r\n\treturn score<\/pre>\n<p>Next, we can define the <a href=\"https:\/\/machinelearningmastery.com\/stochastic-hill-climbing-in-python-from-scratch\/\">stochastic hill climbing algorithm<\/a>.<\/p>\n<p>The algorithm will require an initial solution (e.g. random weights) and will iteratively keep making small changes to the solution and checking if it results in a better performing model. The amount of change made to the current solution is controlled by a <em>step_size<\/em> hyperparameter. This process will continue for a fixed number of iterations, also provided as a hyperparameter.<\/p>\n<p>The <em>hillclimbing()<\/em> function below implements this, taking the dataset, objective function, initial solution, and hyperparameters as arguments and returns the best set of weights found and the estimated performance.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># hill climbing local search algorithm\r\ndef hillclimbing(X, y, objective, solution, n_iter, step_size):\r\n\t# evaluate the initial point\r\n\tsolution_eval = objective(X, y, solution)\r\n\t# run the hill climb\r\n\tfor i in range(n_iter):\r\n\t\t# take a step\r\n\t\tcandidate = solution + randn(len(solution)) * step_size\r\n\t\t# evaluate candidate point\r\n\t\tcandidte_eval = objective(X, y, candidate)\r\n\t\t# check if we should keep the new point\r\n\t\tif candidte_eval &gt;= solution_eval:\r\n\t\t\t# store the new point\r\n\t\t\tsolution, solution_eval = candidate, candidte_eval\r\n\t\t\t# report progress\r\n\t\t\tprint('&gt;%d %.5f' % (i, solution_eval))\r\n\treturn [solution, solution_eval]<\/pre>\n<p>We can then call this function, passing in a set of weights as the initial solution and the training dataset as the dataset to optimize the model against.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define the total iterations\r\nn_iter = 1000\r\n# define the maximum step size\r\nstep_size = 0.05\r\n# determine the number of weights\r\nn_weights = X.shape[1] + 1\r\n# define the initial solution\r\nsolution = rand(n_weights)\r\n# perform the hill climbing search\r\nweights, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)\r\nprint('Done!')\r\nprint('f(%s) = %f' % (weights, score))<\/pre>\n<p>Finally, we can evaluate the best model on the test dataset and report the performance.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# generate predictions for the test dataset\r\nyhat = predict_dataset(X_test, weights)\r\n# calculate accuracy\r\nscore = accuracy_score(y_test, yhat)\r\nprint('Test Accuracy: %.5f' % (score * 100))<\/pre>\n<p>Tying this together, the complete example of optimizing the weights of a Perceptron model on the synthetic binary optimization dataset is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># hill climbing to optimize weights of a perceptron model for classification\r\nfrom numpy import asarray\r\nfrom numpy.random import randn\r\nfrom numpy.random import rand\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.metrics import accuracy_score\r\n\r\n# transfer function\r\ndef transfer(activation):\r\n\tif activation &gt;= 0.0:\r\n\t\treturn 1\r\n\treturn 0\r\n\r\n# activation function\r\ndef activate(row, weights):\r\n\t# add the bias, the last weight\r\n\tactivation = weights[-1]\r\n\t# add the weighted input\r\n\tfor i in range(len(row)):\r\n\t\tactivation += weights[i] * row[i]\r\n\treturn activation\r\n\r\n# # use model weights to predict 0 or 1 for a given row of data\r\ndef predict_row(row, weights):\r\n\t# activate for input\r\n\tactivation = activate(row, weights)\r\n\t# transfer for activation\r\n\treturn transfer(activation)\r\n\r\n# use model weights to generate predictions for a dataset of rows\r\ndef predict_dataset(X, weights):\r\n\tyhats = list()\r\n\tfor row in X:\r\n\t\tyhat = predict_row(row, weights)\r\n\t\tyhats.append(yhat)\r\n\treturn yhats\r\n\r\n# objective function\r\ndef objective(X, y, weights):\r\n\t# generate predictions for dataset\r\n\tyhat = predict_dataset(X, weights)\r\n\t# calculate accuracy\r\n\tscore = accuracy_score(y, yhat)\r\n\treturn score\r\n\r\n# hill climbing local search algorithm\r\ndef hillclimbing(X, y, objective, solution, n_iter, step_size):\r\n\t# evaluate the initial point\r\n\tsolution_eval = objective(X, y, solution)\r\n\t# run the hill climb\r\n\tfor i in range(n_iter):\r\n\t\t# take a step\r\n\t\tcandidate = solution + randn(len(solution)) * step_size\r\n\t\t# evaluate candidate point\r\n\t\tcandidte_eval = objective(X, y, candidate)\r\n\t\t# check if we should keep the new point\r\n\t\tif candidte_eval &gt;= solution_eval:\r\n\t\t\t# store the new point\r\n\t\t\tsolution, solution_eval = candidate, candidte_eval\r\n\t\t\t# report progress\r\n\t\t\tprint('&gt;%d %.5f' % (i, solution_eval))\r\n\treturn [solution, solution_eval]\r\n\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)\r\n# split into train test sets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)\r\n# define the total iterations\r\nn_iter = 1000\r\n# define the maximum step size\r\nstep_size = 0.05\r\n# determine the number of weights\r\nn_weights = X.shape[1] + 1\r\n# define the initial solution\r\nsolution = rand(n_weights)\r\n# perform the hill climbing search\r\nweights, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)\r\nprint('Done!')\r\nprint('f(%s) = %f' % (weights, score))\r\n# generate predictions for the test dataset\r\nyhat = predict_dataset(X_test, weights)\r\n# calculate accuracy\r\nscore = accuracy_score(y_test, yhat)\r\nprint('Test Accuracy: %.5f' % (score * 100))<\/pre>\n<p>Running the example will report the iteration number and classification accuracy each time there is an improvement made to the model.<\/p>\n<p>At the end of the search, the performance of the best set of weights on the training dataset is reported and the performance of the same model on the test dataset is calculated and reported.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the optimization algorithm found a set of weights that achieved about 88.5 percent accuracy on the training dataset and about 81.8 percent accuracy on the test dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n&gt;111 0.88060\r\n&gt;119 0.88060\r\n&gt;126 0.88209\r\n&gt;134 0.88209\r\n&gt;205 0.88209\r\n&gt;262 0.88209\r\n&gt;280 0.88209\r\n&gt;293 0.88209\r\n&gt;297 0.88209\r\n&gt;336 0.88209\r\n&gt;373 0.88209\r\n&gt;437 0.88358\r\n&gt;463 0.88507\r\n&gt;630 0.88507\r\n&gt;701 0.88507\r\nDone!\r\nf([ 0.0097317 0.13818088 1.17634326 -0.04296336 0.00485813 -0.14767616]) = 0.885075\r\nTest Accuracy: 81.81818<\/pre>\n<p>Now that we are familiar with how to manually optimize the weights of a Perceptron model, let&rsquo;s look at how we can extend the example to optimize the weights of a Multilayer Perceptron (MLP) model.<\/p>\n<h2>Optimize a Multilayer Perceptron<\/h2>\n<p>A Multilayer Perceptron (MLP) model is a neural network with one or more layers, where each layer has one or more nodes.<\/p>\n<p>It is an extension of a Perceptron model and is perhaps the most widely used neural network (deep learning) model.<\/p>\n<p>In this section, we will build on what we learned in the previous section to optimize the weights of MLP models with an arbitrary number of layers and nodes per layer.<\/p>\n<p>First, we will develop the model and test it with random weights, then use stochastic hill climbing to optimize the model weights.<\/p>\n<p>When using MLPs for binary classification, it is common to use a sigmoid transfer function (also called the logistic function) instead of the step transfer function used in the Perceptron.<\/p>\n<p>This function outputs a real-value between 0-1 that represents a <a href=\"https:\/\/machinelearningmastery.com\/discrete-probability-distributions-for-machine-learning\/\">binomial probability distribution<\/a>, e.g. the probability that an example belongs to class=1. The <em>transfer()<\/em> function below implements this.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># transfer function\r\ndef transfer(activation):\r\n\t# sigmoid transfer function\r\n\treturn 1.0 \/ (1.0 + exp(-activation))<\/pre>\n<p>We can use the same <em>activate()<\/em> function from the previous section. Here, we will use it to calculate the activation for each node in a given layer.<\/p>\n<p>The <em>predict_row()<\/em> function must be replaced with a more elaborate version.<\/p>\n<p>The function takes a row of data and the network and returns the output of the network.<\/p>\n<p>We will define our network as a list of lists. Each layer will be a list of nodes and each node will be a list or array of weights.<\/p>\n<p>To calculate the prediction of the network, we simply enumerate the layers, then enumerate nodes, then calculate the activation and transfer output for each node. In this case, we will use the same transfer function for all nodes in the network, although this does not have to be the case.<\/p>\n<p>For networks with more than one layer, the output from the previous layer is used as input to each node in the next layer. The output from the final layer in the network is then returned.<\/p>\n<p>The <em>predict_row()<\/em> function below implements this.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># activation function for a network\r\ndef predict_row(row, network):\r\n\tinputs = row\r\n\t# enumerate the layers in the network from input to output\r\n\tfor layer in network:\r\n\t\tnew_inputs = list()\r\n\t\t# enumerate nodes in the layer\r\n\t\tfor node in layer:\r\n\t\t\t# activate the node\r\n\t\t\tactivation = activate(inputs, node)\r\n\t\t\t# transfer activation\r\n\t\t\toutput = transfer(activation)\r\n\t\t\t# store output\r\n\t\t\tnew_inputs.append(output)\r\n\t\t# output from this layer is input to the next layer\r\n\t\tinputs = new_inputs\r\n\treturn inputs[0]<\/pre>\n<p>That&rsquo;s about it.<\/p>\n<p>Finally, we need to define a network to use.<\/p>\n<p>For example, we can define an MLP with a single hidden layer with a single node as follows:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# create a one node network\r\nnode = rand(n_inputs + 1)\r\nlayer = [node]\r\nnetwork = [layer]<\/pre>\n<p>This is practically a Perceptron, although with a sigmoid transfer function. Quite boring.<\/p>\n<p>Let&rsquo;s define an MLP with one hidden layer and one output layer. The first hidden layer will have 10 nodes, and each node will take the input pattern from the dataset (e.g. five inputs). The output layer will have a single node that takes inputs from the outputs of the first hidden layer and then outputs a prediction.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# one hidden layer and an output layer\r\nn_hidden = 10\r\nhidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]\r\noutput1 = [rand(n_hidden + 1)]\r\nnetwork = [hidden1, output1]<\/pre>\n<p>We can then use the model to make predictions on the dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# generate predictions for dataset\r\nyhat = predict_dataset(X, network)<\/pre>\n<p>Before we calculate the classification accuracy, we must round the predictions to class labels 0 and 1.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# round the predictions\r\nyhat = [round(y) for y in yhat]\r\n# calculate accuracy\r\nscore = accuracy_score(y, yhat)\r\nprint(score)<\/pre>\n<p>Tying this all together, the complete example of evaluating an MLP with random initial weights on our synthetic binary classification dataset is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># develop an mlp model for classification\r\nfrom math import exp\r\nfrom numpy.random import rand\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.metrics import accuracy_score\r\n\r\n# transfer function\r\ndef transfer(activation):\r\n\t# sigmoid transfer function\r\n\treturn 1.0 \/ (1.0 + exp(-activation))\r\n\r\n# activation function\r\ndef activate(row, weights):\r\n\t# add the bias, the last weight\r\n\tactivation = weights[-1]\r\n\t# add the weighted input\r\n\tfor i in range(len(row)):\r\n\t\tactivation += weights[i] * row[i]\r\n\treturn activation\r\n\r\n# activation function for a network\r\ndef predict_row(row, network):\r\n\tinputs = row\r\n\t# enumerate the layers in the network from input to output\r\n\tfor layer in network:\r\n\t\tnew_inputs = list()\r\n\t\t# enumerate nodes in the layer\r\n\t\tfor node in layer:\r\n\t\t\t# activate the node\r\n\t\t\tactivation = activate(inputs, node)\r\n\t\t\t# transfer activation\r\n\t\t\toutput = transfer(activation)\r\n\t\t\t# store output\r\n\t\t\tnew_inputs.append(output)\r\n\t\t# output from this layer is input to the next layer\r\n\t\tinputs = new_inputs\r\n\treturn inputs[0]\r\n\r\n# use model weights to generate predictions for a dataset of rows\r\ndef predict_dataset(X, network):\r\n\tyhats = list()\r\n\tfor row in X:\r\n\t\tyhat = predict_row(row, network)\r\n\t\tyhats.append(yhat)\r\n\treturn yhats\r\n\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)\r\n# determine the number of inputs\r\nn_inputs = X.shape[1]\r\n# one hidden layer and an output layer\r\nn_hidden = 10\r\nhidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]\r\noutput1 = [rand(n_hidden + 1)]\r\nnetwork = [hidden1, output1]\r\n# generate predictions for dataset\r\nyhat = predict_dataset(X, network)\r\n# round the predictions\r\nyhat = [round(y) for y in yhat]\r\n# calculate accuracy\r\nscore = accuracy_score(y, yhat)\r\nprint(score)<\/pre>\n<p>Running the example generates a prediction for each example in the training dataset, then prints the classification accuracy for the predictions.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>Again, we would expect about 50 percent accuracy given a set of random weights and a dataset with an equal number of examples in each class, and that is approximately what we see in this case.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">0.499<\/pre>\n<p>Next, we can apply the <a href=\"https:\/\/machinelearningmastery.com\/stochastic-hill-climbing-in-python-from-scratch\/\">stochastic hill climbing algorithm<\/a> to the dataset.<\/p>\n<p>It is very much the same as applying hill climbing to the Perceptron model, except in this case, a step requires a modification to all weights in the network.<\/p>\n<p>For this, we will develop a new function that creates a copy of the network and mutates each weight in the network while making the copy.<\/p>\n<p>The <em>step()<\/em> function below implements this.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># take a step in the search space\r\ndef step(network, step_size):\r\n\tnew_net = list()\r\n\t# enumerate layers in the network\r\n\tfor layer in network:\r\n\t\tnew_layer = list()\r\n\t\t# enumerate nodes in this layer\r\n\t\tfor node in layer:\r\n\t\t\t# mutate the node\r\n\t\t\tnew_node = node.copy() + randn(len(node)) * step_size\r\n\t\t\t# store node in layer\r\n\t\t\tnew_layer.append(new_node)\r\n\t\t# store layer in network\r\n\t\tnew_net.append(new_layer)\r\n\treturn new_net<\/pre>\n<p>Modifying all weight in the network is aggressive.<\/p>\n<p>A less aggressive step in the search space might be to make a small change to a subset of the weights in the model, perhaps controlled by a hyperparameter. This is left as an extension.<\/p>\n<p>We can then call this new <em>step()<\/em> function from the hillclimbing() function.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># hill climbing local search algorithm\r\ndef hillclimbing(X, y, objective, solution, n_iter, step_size):\r\n\t# evaluate the initial point\r\n\tsolution_eval = objective(X, y, solution)\r\n\t# run the hill climb\r\n\tfor i in range(n_iter):\r\n\t\t# take a step\r\n\t\tcandidate = step(solution, step_size)\r\n\t\t# evaluate candidate point\r\n\t\tcandidte_eval = objective(X, y, candidate)\r\n\t\t# check if we should keep the new point\r\n\t\tif candidte_eval &gt;= solution_eval:\r\n\t\t\t# store the new point\r\n\t\t\tsolution, solution_eval = candidate, candidte_eval\r\n\t\t\t# report progress\r\n\t\t\tprint('&gt;%d %f' % (i, solution_eval))\r\n\treturn [solution, solution_eval]<\/pre>\n<p>Tying this together, the complete example of applying stochastic hill climbing to optimize the weights of an MLP model for binary classification is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># stochastic hill climbing to optimize a multilayer perceptron for classification\r\nfrom math import exp\r\nfrom numpy.random import randn\r\nfrom numpy.random import rand\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.metrics import accuracy_score\r\n\r\n# transfer function\r\ndef transfer(activation):\r\n\t# sigmoid transfer function\r\n\treturn 1.0 \/ (1.0 + exp(-activation))\r\n\r\n# activation function\r\ndef activate(row, weights):\r\n\t# add the bias, the last weight\r\n\tactivation = weights[-1]\r\n\t# add the weighted input\r\n\tfor i in range(len(row)):\r\n\t\tactivation += weights[i] * row[i]\r\n\treturn activation\r\n\r\n# activation function for a network\r\ndef predict_row(row, network):\r\n\tinputs = row\r\n\t# enumerate the layers in the network from input to output\r\n\tfor layer in network:\r\n\t\tnew_inputs = list()\r\n\t\t# enumerate nodes in the layer\r\n\t\tfor node in layer:\r\n\t\t\t# activate the node\r\n\t\t\tactivation = activate(inputs, node)\r\n\t\t\t# transfer activation\r\n\t\t\toutput = transfer(activation)\r\n\t\t\t# store output\r\n\t\t\tnew_inputs.append(output)\r\n\t\t# output from this layer is input to the next layer\r\n\t\tinputs = new_inputs\r\n\treturn inputs[0]\r\n\r\n# use model weights to generate predictions for a dataset of rows\r\ndef predict_dataset(X, network):\r\n\tyhats = list()\r\n\tfor row in X:\r\n\t\tyhat = predict_row(row, network)\r\n\t\tyhats.append(yhat)\r\n\treturn yhats\r\n\r\n# objective function\r\ndef objective(X, y, network):\r\n\t# generate predictions for dataset\r\n\tyhat = predict_dataset(X, network)\r\n\t# round the predictions\r\n\tyhat = [round(y) for y in yhat]\r\n\t# calculate accuracy\r\n\tscore = accuracy_score(y, yhat)\r\n\treturn score\r\n\r\n# take a step in the search space\r\ndef step(network, step_size):\r\n\tnew_net = list()\r\n\t# enumerate layers in the network\r\n\tfor layer in network:\r\n\t\tnew_layer = list()\r\n\t\t# enumerate nodes in this layer\r\n\t\tfor node in layer:\r\n\t\t\t# mutate the node\r\n\t\t\tnew_node = node.copy() + randn(len(node)) * step_size\r\n\t\t\t# store node in layer\r\n\t\t\tnew_layer.append(new_node)\r\n\t\t# store layer in network\r\n\t\tnew_net.append(new_layer)\r\n\treturn new_net\r\n\r\n# hill climbing local search algorithm\r\ndef hillclimbing(X, y, objective, solution, n_iter, step_size):\r\n\t# evaluate the initial point\r\n\tsolution_eval = objective(X, y, solution)\r\n\t# run the hill climb\r\n\tfor i in range(n_iter):\r\n\t\t# take a step\r\n\t\tcandidate = step(solution, step_size)\r\n\t\t# evaluate candidate point\r\n\t\tcandidte_eval = objective(X, y, candidate)\r\n\t\t# check if we should keep the new point\r\n\t\tif candidte_eval &gt;= solution_eval:\r\n\t\t\t# store the new point\r\n\t\t\tsolution, solution_eval = candidate, candidte_eval\r\n\t\t\t# report progress\r\n\t\t\tprint('&gt;%d %f' % (i, solution_eval))\r\n\treturn [solution, solution_eval]\r\n\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)\r\n# split into train test sets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)\r\n# define the total iterations\r\nn_iter = 1000\r\n# define the maximum step size\r\nstep_size = 0.1\r\n# determine the number of inputs\r\nn_inputs = X.shape[1]\r\n# one hidden layer and an output layer\r\nn_hidden = 10\r\nhidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]\r\noutput1 = [rand(n_hidden + 1)]\r\nnetwork = [hidden1, output1]\r\n# perform the hill climbing search\r\nnetwork, score = hillclimbing(X_train, y_train, objective, network, n_iter, step_size)\r\nprint('Done!')\r\nprint('Best: %f' % (score))\r\n# generate predictions for the test dataset\r\nyhat = predict_dataset(X_test, network)\r\n# round the predictions\r\nyhat = [round(y) for y in yhat]\r\n# calculate accuracy\r\nscore = accuracy_score(y_test, yhat)\r\nprint('Test Accuracy: %.5f' % (score * 100))<\/pre>\n<p>Running the example will report the iteration number and classification accuracy each time there is an improvement made to the model.<\/p>\n<p>At the end of the search, the performance of the best set of weights on the training dataset is reported and the performance of the same model on the test dataset is calculated and reported.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the optimization algorithm found a set of weights that achieved about 87.3 percent accuracy on the training dataset and about 85.1 percent accuracy on the test dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n&gt;55 0.755224\r\n&gt;56 0.765672\r\n&gt;59 0.794030\r\n&gt;66 0.805970\r\n&gt;77 0.835821\r\n&gt;120 0.838806\r\n&gt;165 0.840299\r\n&gt;188 0.841791\r\n&gt;218 0.846269\r\n&gt;232 0.852239\r\n&gt;237 0.852239\r\n&gt;239 0.855224\r\n&gt;292 0.867164\r\n&gt;368 0.868657\r\n&gt;823 0.868657\r\n&gt;852 0.871642\r\n&gt;889 0.871642\r\n&gt;892 0.871642\r\n&gt;992 0.873134\r\nDone!\r\nBest: 0.873134\r\nTest Accuracy: 85.15152<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Tutorials<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/train-test-split-for-evaluating-machine-learning-algorithms\/\">Train-Test Split for Evaluating Machine Learning Algorithms<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/implement-perceptron-algorithm-scratch-python\/\">How To Implement The Perceptron Algorithm From Scratch In Python<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/implement-backpropagation-algorithm-scratch-python\/\">How to Code a Neural Network with Backpropagation In Python (from scratch)<\/a><\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">sklearn.datasets.make_classification APIs<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.accuracy_score.html\">sklearn.metrics.accuracy_score APIs<\/a>.<\/li>\n<li><a href=\"https:\/\/numpy.org\/doc\/stable\/reference\/random\/generated\/numpy.random.rand.html\">numpy.random.rand API<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to manually optimize the weights of neural network models.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>How to develop the forward inference pass for neural network models from scratch.<\/li>\n<li>How to optimize the weights of a Perceptron model for binary classification.<\/li>\n<li>How to optimize the weights of a Multilayer Perceptron model using stochastic hill climbing.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/manually-optimize-neural-networks\/\">How to Manually Optimize Neural Network Models<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/manually-optimize-neural-networks\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Deep learning neural network models are fit on training data using the stochastic gradient descent optimization algorithm. Updates to the weights of [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2020\/12\/03\/how-to-manually-optimize-neural-network-models\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":4154,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4153"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=4153"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4153\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/4154"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=4153"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=4153"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=4153"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}