{"id":1714,"date":"2019-02-12T18:00:04","date_gmt":"2019-02-12T18:00:04","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/02\/12\/how-to-control-neural-network-model-capacity-with-nodes-and-layers\/"},"modified":"2019-02-12T18:00:04","modified_gmt":"2019-02-12T18:00:04","slug":"how-to-control-neural-network-model-capacity-with-nodes-and-layers","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/02\/12\/how-to-control-neural-network-model-capacity-with-nodes-and-layers\/","title":{"rendered":"How to Control Neural Network Model Capacity With Nodes and Layers"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>The capacity of a deep learning neural network model controls the scope of the types of mapping functions that it is able to learn.<\/p>\n<p>A model with too little capacity cannot learn the training dataset meaning it will underfit, whereas a model with too much capacity may memorize the training dataset, meaning it will overfit or may get stuck or lost during the optimization process.<\/p>\n<p>The capacity of a neural network model is defined by configuring the number of nodes and the number of layers.<\/p>\n<p>In this tutorial, you will discover how to control the capacity of a neural network model and how capacity impacts what a model is capable of learning.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Neural network model capacity is controlled both by the number of nodes and the number of layers in the model.<\/li>\n<li>A model with a single hidden layer and sufficient number of nodes has the capability of learning any mapping function, but the chosen learning algorithm may or may not be able to realize this capability.<\/li>\n<li>Increasing the number of layers provides a short-cut to increasing the capacity of the model with fewer resources, and modern techniques allow learning algorithms to successfully train deep models.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_6990\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6990\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/02\/How-to-Control-Neural-Network-Model-Capacity-With-Nodes-and-Layers.jpg\" alt=\"How to Control Neural Network Model Capacity With Nodes and Layers\" width=\"640\" height=\"400\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/How-to-Control-Neural-Network-Model-Capacity-With-Nodes-and-Layers.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/How-to-Control-Neural-Network-Model-Capacity-With-Nodes-and-Layers-300x188.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p class=\"wp-caption-text\">How to Control Neural Network Model Capacity With Nodes and Layers<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/volvob12b\/9497132539\/\">Bernard Spragg. NZ<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into five parts; they are:<\/p>\n<ol>\n<li>Controlling Neural Network Model Capacity<\/li>\n<li>Configure Nodes and Layers in Keras<\/li>\n<li>Multi-Class Classification Problem<\/li>\n<li>Change Model Capacity With Nodes<\/li>\n<li>Change Model Capacity With Layers<\/li>\n<\/ol>\n<h2>Controlling Neural Network Model Capacity<\/h2>\n<p>The goal of a neural network is to learn how to map input examples to output examples.<\/p>\n<p>Neural networks learn mapping functions. The capacity of a network refers to the range or scope of the types of functions that the model can approximate.<\/p>\n<blockquote>\n<p>Informally, a model\u2019s capacity is its ability to fit a wide variety of functions.<\/p>\n<\/blockquote>\n<p>\u2014 Pages 111-112, <a href=\"https:\/\/amzn.to\/2IXzUIY\">Deep Learning<\/a>, 2016.<\/p>\n<p>A model with less capacity may not be able to sufficiently learn the training dataset. A model with more capacity can model more different types of functions and may be able to learn a function to sufficiently map inputs to outputs in the training dataset. Whereas a model with too much capacity may memorize the training dataset and fail to generalize or get lost or stuck in the search for a suitable mapping function.<\/p>\n<p>Generally, we can think of model capacity as a control over whether the model is likely to underfit or overfit a training dataset.<\/p>\n<blockquote>\n<p>We can control whether a model is more likely to overfit or underfit by altering its capacity.<\/p>\n<\/blockquote>\n<p>\u2014 Pages 111, <a href=\"https:\/\/amzn.to\/2IXzUIY\">Deep Learning<\/a>, 2016.<\/p>\n<p>The capacity of a neural network can be controlled by two aspects of the model:<\/p>\n<ul>\n<li>Number of Nodes.<\/li>\n<li>Number of Layers.<\/li>\n<\/ul>\n<p>A model with more nodes or more layers has a greater capacity and, in turn, is potentially capable of learning a larger set of mapping functions.<\/p>\n<blockquote>\n<p>A model with more layers and more hidden units per layer has higher representational capacity \u2014 it is capable of representing more complicated functions.<\/p>\n<\/blockquote>\n<p>\u2014 Pages 428, <a href=\"https:\/\/amzn.to\/2IXzUIY\">Deep Learning<\/a>, 2016.<\/p>\n<p>The number of nodes in a layer is referred to as the <strong>width<\/strong>.<\/p>\n<p>Developing wide networks with one layer and many nodes was relatively straightforward. In theory, a network with enough nodes in the single hidden layer can learn to approximate any mapping function, although in practice, we don\u2019t know how many nodes are sufficient or how to train such a model.<\/p>\n<p>The number of layers in a model is referred to as its <strong>depth<\/strong>.<\/p>\n<p>Increasing the depth increases the capacity of the model. Training deep models, e.g. those with many hidden layers, can be computationally more efficient than training a single layer network with a vast number of nodes.<\/p>\n<blockquote>\n<p>Modern deep learning provides a very powerful framework for supervised learning. By adding more layers and more units within a layer, a deep network can represent functions of increasing complexity.<\/p>\n<\/blockquote>\n<p>\u2014 Pages 167, <a href=\"https:\/\/amzn.to\/2IXzUIY\">Deep Learning<\/a>, 2016.<\/p>\n<p>Traditionally, it has been challenging to train neural network models with more than a few layers due to problems such as vanishing gradients. More recently, modern methods have allowed the training of deep network models, allowing the developing of models of surprising depth that are capable of achieving impressive performance on challenging problems in a wide range of domains.<\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<p><center><\/p>\n<h3>Want Better Results with Deep Learning?<\/h3>\n<p>Take my free 7-day email crash course now (with sample code).<\/p>\n<p>Click to sign-up and also get a free PDF Ebook version of the course.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1433e7773f72a2%3A164f8be4f346dc\/5764144745676800\/\" target=\"_blank\" style=\"background: rgb(255, 206, 10); color: rgb(255, 255, 255); text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-weight: bold; font-size: 16px; line-height: 20px; padding: 10px; display: inline-block; max-width: 300px; border-radius: 5px; text-shadow: rgba(0, 0, 0, 0.25) 0px -1px 1px; box-shadow: rgba(255, 255, 255, 0.5) 0px 1px 3px inset, rgba(0, 0, 0, 0.5) 0px 1px 3px;\">Download Your FREE Mini-Course<\/a><script data-leadbox=\"1433e7773f72a2:164f8be4f346dc\" data-url=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1433e7773f72a2%3A164f8be4f346dc\/5764144745676800\/\" data-config=\"%7B%7D\" type=\"text\/javascript\" src=\"https:\/\/machinelearningmastery.lpages.co\/leadbox-1543333086.js\"><\/script><\/p>\n<p><\/center><\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<h2>Configure Nodes and Layers in Keras<\/h2>\n<p>Keras allows you to easily add nodes and layers to your model.<\/p>\n<h3>Configuring Model Nodes<\/h3>\n<p>The first argument of the layer specifies the number of nodes used in the layer.<\/p>\n<p>Fully connected layers for the Multilayer Perceptron, or MLP, model are added via the Dense layer.<\/p>\n<p>For example, we can create one fully-connected layer with 32 nodes as follows:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\nlayer = Dense(32)<\/pre>\n<p>Similarly, the number of nodes can be specified for recurrent neural network layers in the same way.<\/p>\n<p>For example, we can create one LSTM layer with 32 nodes (or units) as follows:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\nlayer = LSTM(32)<\/pre>\n<p>Convolutional neural networks, or CNN, don\u2019t have nodes, instead specify the number of filter maps and their shape. The number and size of filter maps define the capacity of the layer.<\/p>\n<p>We can define a two-dimensional CNN with 32 filter maps, each with a size of 3 by 3, as follows:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\nlayer = Conv2D(32, (3,3))<\/pre>\n<\/p>\n<h3>Configuring Model Layers<\/h3>\n<p>Layers are added to a sequential model via calls to the add() function and passing in the layer.<\/p>\n<p>Fully connected layers for the MLP can be added via repeated calls to add passing in the configured Dense layers; for example:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\nmodel = Sequential()\r\nmodel.add(Dense(32))\r\nmodel.add(Dense(64))<\/pre>\n<p>Similarly, the number of layers for a recurrent network can be added in the same way to give a stacked recurrent model.<\/p>\n<p>An important difference is that recurrent layers expect a three-dimensional input, therefore the prior recurrent layer must return the full sequence of outputs rather than the single output for each node at the end of the input sequence.<\/p>\n<p>This can be achieved by setting the \u201c<em>return_sequences<\/em>\u201d argument to \u201c<em>True<\/em>\u201c. For example:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\nmodel = Sequential()\r\nmodel.add(LSTM(32, return_sequences=True))\r\nmodel.add(LSTM(32))<\/pre>\n<p>Convolutional layers can be stacked directly, and it is common to stack one or two convolutional layers together followed by a pooling layer, then repeat this pattern of layers; for example:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\nmodel = Sequential()\r\nmodel.add(Conv2D(16, (3,3)))\r\nmodel.add(Conv2D(16, (3,3)))\r\nmodel.add(MaxPooling2D((2,2)))\r\nmodel.add(Conv2D(32, (3,3)))\r\nmodel.add(Conv2D(32, (3,3)))\r\nmodel.add(MaxPooling2D((2,2)))<\/pre>\n<p>Now that we know how to configure the number of nodes and layers for models in Keras, we can look at how the capacity affects model performance on a multi-class classification problem.<\/p>\n<h2>Multi-Class Classification Problem<\/h2>\n<p>We will use a standard multi-class classification problem as the basis to demonstrate the effect of model capacity on model performance.<\/p>\n<p>The scikit-learn class provides the <a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_blobs.html\">make_blobs() function<\/a> that can be used to create a multi-class classification problem with the prescribed number of samples, input variables, classes, and variance of samples within a class.<\/p>\n<p>We can configure the problem to have a specific number of input variables via the \u201c<em>n_features<\/em>\u201d argument, and a specific number of classes or centers via the \u201c<em>centers<\/em>\u201d argument. The \u201c<em>random_state<\/em>\u201d can be used to seed the pseudorandom number generator to ensure that we always get the same samples each time the function is called.<\/p>\n<p>For example, the call below generates 1,000 examples for a three class problem with two input variables.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# generate 2d classification dataset\r\nX, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=2)<\/pre>\n<p>The results are the input and output elements of a dataset that we can model.<\/p>\n<p>In order to get a feeling for the complexity of the problem, we can plot each point on a two-dimensional scatter plot and color each point by class value.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># scatter plot of blobs dataset\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom matplotlib import pyplot\r\nfrom numpy import where\r\n# generate 2d classification dataset\r\nX, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=2)\r\n# scatter plot for each class value\r\nfor class_value in range(3):\r\n\t# select indices of points with the class label\r\n\trow_ix = where(y == class_value)\r\n\t# scatter plot for points with a different color\r\n\tpyplot.scatter(X[row_ix, 0], X[row_ix, 1])\r\n# show plot\r\npyplot.show()<\/pre>\n<p>Running the example creates a scatter plot of the entire dataset. We can see that the chosen standard deviation of 2.0 means that the classes are not linearly separable (separable by a line), causing many ambiguous points.<\/p>\n<p>This is desirable as it means that the problem is non-trivial and will allow a neural network model to find many different \u201c<em>good enough<\/em>\u201d candidate solutions.<\/p>\n<div id=\"attachment_6987\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6987\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/11\/Scatter-Plot-of-Blobs-Dataset-with-Three-Classes-and-Points-Colored-by-Class-Value-3.png\" alt=\"Scatter Plot of Blobs Dataset With Three Classes and Points Colored by Class Value\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Scatter-Plot-of-Blobs-Dataset-with-Three-Classes-and-Points-Colored-by-Class-Value-3.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Scatter-Plot-of-Blobs-Dataset-with-Three-Classes-and-Points-Colored-by-Class-Value-3-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Scatter-Plot-of-Blobs-Dataset-with-Three-Classes-and-Points-Colored-by-Class-Value-3-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Scatter-Plot-of-Blobs-Dataset-with-Three-Classes-and-Points-Colored-by-Class-Value-3-1024x768.png 1024w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p class=\"wp-caption-text\">Scatter Plot of Blobs Dataset With Three Classes and Points Colored by Class Value<\/p>\n<\/div>\n<p>In order to explore model capacity, we need more complexity in the problem than three classes and two variables.<\/p>\n<p>For the purposes of the following experiments, we will use 100 input features and 20 classes; for example:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# generate 2d classification dataset\r\nX, y = make_blobs(n_samples=1000, centers=20, n_features=100, cluster_std=2, random_state=2)<\/pre>\n<\/p>\n<h2>Change Model Capacity With Nodes<\/h2>\n<p>In this section, we will develop a Multilayer Perceptron model, or MLP, for the blobs multi-class classification problem and demonstrate the effect that the number of nodes has on the ability of the model to learn.<\/p>\n<p>We can start off by developing a function to prepare the dataset.<\/p>\n<p>The input and output elements of the dataset can be created using the <em>make_blobs()<\/em> function as described in the previous section.<\/p>\n<p>Next, the target variable must be one hot encoded. This is so that the model can learn to predict the probability of an input example belonging to each of the 20 classes.<\/p>\n<p>We can use the <a href=\"https:\/\/keras.io\/utils\/#to_categorical\">to_categorical() Keras utility function<\/a> to do this, for example:<\/p>\n<pre class=\"crayon-plain-tag\"># one hot encode output variable\r\ny = to_categorical(y)<\/pre>\n<p>Next, we can split the 1,000 examples in half and use 500 examples as the training dataset and 500 to evaluate the model.<\/p>\n<pre class=\"crayon-plain-tag\"># split into train and test\r\nn_train = 500\r\ntrainX, testX = X[:n_train, :], X[n_train:, :]\r\ntrainy, testy = y[:n_train], y[n_train:]\r\nreturn trainX, trainy, testX, testy<\/pre>\n<p>The <em>create_dataset()<\/em> function below ties these elements together and returns the train and test sets in terms of the input and output elements.<\/p>\n<pre class=\"crayon-plain-tag\"># prepare multi-class classification dataset\r\ndef create_dataset():\r\n\t# generate 2d classification dataset\r\n\tX, y = make_blobs(n_samples=1000, centers=20, n_features=100, cluster_std=2, random_state=2)\r\n\t# one hot encode output variable\r\n\ty = to_categorical(y)\r\n\t# split into train and test\r\n\tn_train = 500\r\n\ttrainX, testX = X[:n_train, :], X[n_train:, :]\r\n\ttrainy, testy = y[:n_train], y[n_train:]\r\n\treturn trainX, trainy, testX, testy<\/pre>\n<p>We can call this function to prepare the dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># prepare dataset\r\ntrainX, trainy, testX, testy = create_dataset()<\/pre>\n<p>Next, we can define a function that will create the model, fit it on the training dataset, and then evaluate it on the test dataset.<\/p>\n<p>The model needs to know the number of input variables in order to configure the input layer and the number of target classes in order to configure the output layer. These properties can be extracted from the training dataset directly.<\/p>\n<pre class=\"crayon-plain-tag\"># configure the model based on the data\r\nn_input, n_classes = trainX.shape[1], testy.shape[1]<\/pre>\n<p>We will define an MLP model with a single hidden layer that uses the rectified linear activation function and the He random weight initialization method.<\/p>\n<p>The output layer will use the softmax activation function in order to predict a probability for each target class. The number of nodes in the hidden layer will be provided via an argument called \u201c<em>n_nodes<\/em>\u201c.<\/p>\n<pre class=\"crayon-plain-tag\"># define model\r\nmodel = Sequential()\r\nmodel.add(Dense(n_nodes, input_dim=n_input, activation='relu', kernel_initializer='he_uniform'))\r\nmodel.add(Dense(n_classes, activation='softmax'))<\/pre>\n<p>The model will be optimized using stochastic gradient descent with a modest learning rate of 0.01 with a high momentum of 0.9, and a categorical cross entropy loss function will be used, suitable for multi-class classification.<\/p>\n<pre class=\"crayon-plain-tag\"># compile model\r\nopt = SGD(lr=0.01, momentum=0.9)\r\nmodel.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])<\/pre>\n<p>The model will be fit for 100 training epochs, then the model will be evaluated on the test dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># fit model on train set\r\nhistory = model.fit(trainX, trainy, epochs=100, verbose=0)\r\n# evaluate model on test set\r\n_, test_acc = model.evaluate(testX, testy, verbose=0)<\/pre>\n<p>Tying these elements together, the <em>evaluate_model()<\/em> function below takes the number of nodes and dataset as arguments and returns the history of the training loss at the end of each epoch and the accuracy of the final model on the test dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># fit model with given number of nodes, returns test set accuracy\r\ndef evaluate_model(n_nodes, trainX, trainy, testX, testy):\r\n\t# configure the model based on the data\r\n\tn_input, n_classes = trainX.shape[1], testy.shape[1]\r\n\t# define model\r\n\tmodel = Sequential()\r\n\tmodel.add(Dense(n_nodes, input_dim=n_input, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(n_classes, activation='softmax'))\r\n\t# compile model\r\n\topt = SGD(lr=0.01, momentum=0.9)\r\n\tmodel.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])\r\n\t# fit model on train set\r\n\thistory = model.fit(trainX, trainy, epochs=100, verbose=0)\r\n\t# evaluate model on test set\r\n\t_, test_acc = model.evaluate(testX, testy, verbose=0)\r\n\treturn history, test_acc<\/pre>\n<p>We can call this function with different numbers of nodes to use in the hidden layer.<\/p>\n<p>The problem is relatively simple; therefore, we will review the performance of the model with 1 to 7 nodes.<\/p>\n<p>We would expect that as the number of nodes is increased, that this would increase the capacity of the model and allow the model to better learn the training dataset, at least to a point limited by the chosen configuration for the learning algorithm (e.g. learning rate, batch size, and epochs).<\/p>\n<p>The test accuracy for each configuration will be printed and the learning curves of training accuracy with each configuration will be plotted.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate model and plot learning curve with given number of nodes\r\nnum_nodes = [1, 2, 3, 4, 5, 6, 7]\r\nfor n_nodes in num_nodes:\r\n\t# evaluate model with a given number of nodes\r\n\thistory, result = evaluate_model(n_nodes, trainX, trainy, testX, testy)\r\n\t# summarize final test set accuracy\r\n\tprint('nodes=%d: %.3f' % (n_nodes, result))\r\n\t# plot learning curve\r\n\tpyplot.plot(history.history['loss'], label=str(n_nodes))\r\n# show the plot\r\npyplot.legend()\r\npyplot.show()<\/pre>\n<p>The full code listing is provided below for completeness.<\/p>\n<pre class=\"crayon-plain-tag\"># study of mlp learning curves given different number of nodes for multi-class classification\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom keras.layers import Dense\r\nfrom keras.models import Sequential\r\nfrom keras.optimizers import SGD\r\nfrom keras.utils import to_categorical\r\nfrom matplotlib import pyplot\r\n\r\n# prepare multi-class classification dataset\r\ndef create_dataset():\r\n\t# generate 2d classification dataset\r\n\tX, y = make_blobs(n_samples=1000, centers=20, n_features=100, cluster_std=2, random_state=2)\r\n\t# one hot encode output variable\r\n\ty = to_categorical(y)\r\n\t# split into train and test\r\n\tn_train = 500\r\n\ttrainX, testX = X[:n_train, :], X[n_train:, :]\r\n\ttrainy, testy = y[:n_train], y[n_train:]\r\n\treturn trainX, trainy, testX, testy\r\n\r\n# fit model with given number of nodes, returns test set accuracy\r\ndef evaluate_model(n_nodes, trainX, trainy, testX, testy):\r\n\t# configure the model based on the data\r\n\tn_input, n_classes = trainX.shape[1], testy.shape[1]\r\n\t# define model\r\n\tmodel = Sequential()\r\n\tmodel.add(Dense(n_nodes, input_dim=n_input, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(n_classes, activation='softmax'))\r\n\t# compile model\r\n\topt = SGD(lr=0.01, momentum=0.9)\r\n\tmodel.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])\r\n\t# fit model on train set\r\n\thistory = model.fit(trainX, trainy, epochs=100, verbose=0)\r\n\t# evaluate model on test set\r\n\t_, test_acc = model.evaluate(testX, testy, verbose=0)\r\n\treturn history, test_acc\r\n\r\n# prepare dataset\r\ntrainX, trainy, testX, testy = create_dataset()\r\n# evaluate model and plot learning curve with given number of nodes\r\nnum_nodes = [1, 2, 3, 4, 5, 6, 7]\r\nfor n_nodes in num_nodes:\r\n\t# evaluate model with a given number of nodes\r\n\thistory, result = evaluate_model(n_nodes, trainX, trainy, testX, testy)\r\n\t# summarize final test set accuracy\r\n\tprint('nodes=%d: %.3f' % (n_nodes, result))\r\n\t# plot learning curve\r\n\tpyplot.plot(history.history['loss'], label=str(n_nodes))\r\n# show the plot\r\npyplot.legend()\r\npyplot.show()<\/pre>\n<p>Running the example first prints the test accuracy for each model configuration.<\/p>\n<p>Your specific results will vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p>In this case, we can see that as the number of nodes is increased, the capacity of the model to learn the problem is increased. This results in a progressive lowering of the generalization error of the model on the test dataset until 6 and 7 nodes when the model learns the problem perfectly.<\/p>\n<pre class=\"crayon-plain-tag\">nodes=1: 0.138\r\nnodes=2: 0.380\r\nnodes=3: 0.582\r\nnodes=4: 0.890\r\nnodes=5: 0.844\r\nnodes=6: 1.000\r\nnodes=7: 1.000<\/pre>\n<p>A line plot is also created showing cross entropy loss on the training dataset for each model configuration (1 to 7 nodes in the hidden layer) over the 100 training epochs.<\/p>\n<p>We can see that as the number of nodes is increased, the model is able to better decrease the loss, e.g. to better learn the training dataset. This plot shows the direct relationship between model capacity, as defined by the number of nodes in the hidden layer and the model\u2019s ability to learn.<\/p>\n<div id=\"attachment_6988\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6988\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/11\/Line-Plot-of-Cross-Entropy-Loss-Over-Training-Epochs-for-an-MLP-on-the-Training-Dataset-for-the-Blobs-Multi-Class-Classification-Problem-When-Varying-Model-Nodes.png\" alt=\"Line Plot of Cross Entropy Loss Over Training Epochs for an MLP on the Training Dataset for the Blobs Multi-Class Classification Problem When Varying Model Nodes\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Line-Plot-of-Cross-Entropy-Loss-Over-Training-Epochs-for-an-MLP-on-the-Training-Dataset-for-the-Blobs-Multi-Class-Classification-Problem-When-Varying-Model-Nodes.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Line-Plot-of-Cross-Entropy-Loss-Over-Training-Epochs-for-an-MLP-on-the-Training-Dataset-for-the-Blobs-Multi-Class-Classification-Problem-When-Varying-Model-Nodes-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Line-Plot-of-Cross-Entropy-Loss-Over-Training-Epochs-for-an-MLP-on-the-Training-Dataset-for-the-Blobs-Multi-Class-Classification-Problem-When-Varying-Model-Nodes-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Line-Plot-of-Cross-Entropy-Loss-Over-Training-Epochs-for-an-MLP-on-the-Training-Dataset-for-the-Blobs-Multi-Class-Classification-Problem-When-Varying-Model-Nodes-1024x768.png 1024w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p class=\"wp-caption-text\">Line Plot of Cross Entropy Loss Over Training Epochs for an MLP on the Training Dataset for the Blobs Multi-Class Classification Problem When Varying Model Nodes<\/p>\n<\/div>\n<p>The number of nodes can be increased to the point (e.g. 1,000 nodes) where the learning algorithm is no longer able to sufficiently learn the mapping function.<\/p>\n<h2>Change Model Capacity With Layers<\/h2>\n<p>We can perform a similar analysis and evaluate how the number of layers impacts the ability of the model to learn the mapping function.<\/p>\n<p>Increasing the number of layers can often greatly increase the capacity of the model, acting like a computational and learning shortcut to modeling a problem. For example, a model with one hidden layer of 10 nodes is not equivalent to a model with two hidden layers with five nodes each. The latter has a much greater capacity.<\/p>\n<p>The danger is that a model with more capacity than is required is likely to overfit the training data, and as with a model that has too many nodes, a model with too many layers will likely be unable to learn the training dataset, getting lost or stuck during the optimization process.<\/p>\n<p>First, we can update the <em>evaluate_model()<\/em> function to fit an MLP model with a given number of layers.<\/p>\n<p>We know from the previous section that an MLP with about seven or more nodes fit for 100 epochs will learn the problem perfectly. We will, therefore, use 10 nodes in each layer to ensure the model has enough capacity in just one layer to learn the problem.<\/p>\n<p>The updated function is listed below, taking the number of layers and dataset as arguments and returning the training history and test accuracy of the model.<\/p>\n<pre class=\"crayon-plain-tag\"># fit model with given number of layers, returns test set accuracy\r\ndef evaluate_model(n_layers, trainX, trainy, testX, testy):\r\n\t# configure the model based on the data\r\n\tn_input, n_classes = trainX.shape[1], testy.shape[1]\r\n\t# define model\r\n\tmodel = Sequential()\r\n\tmodel.add(Dense(10, input_dim=n_input, activation='relu', kernel_initializer='he_uniform'))\r\n\tfor _ in range(1, n_layers):\r\n\t\tmodel.add(Dense(10, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(n_classes, activation='softmax'))\r\n\t# compile model\r\n\topt = SGD(lr=0.01, momentum=0.9)\r\n\tmodel.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])\r\n\t# fit model\r\n\thistory = model.fit(trainX, trainy, epochs=100, verbose=0)\r\n\t# evaluate model on test set\r\n\t_, test_acc = model.evaluate(testX, testy, verbose=0)\r\n\treturn history, test_acc<\/pre>\n<p>Given that a single hidden layer model has enough capacity to learn this problem, we will explore increasing the number of layers to the point where the learning algorithm becomes unstable and can no longer learn the problem.<\/p>\n<p>If the chosen modeling problem was more complex, we could explore increasing the layers and review the improvements in model performance to a point of diminishing returns.<\/p>\n<p>In this case, we will evaluate the model with 1 to 5 layers, with the expectation that at some point, the number of layers will result in a model that the chosen learning algorithm is unable to adapt to the training data.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate model and plot learning curve of model with given number of layers\r\nall_history = list()\r\nnum_layers = [1, 2, 3, 4, 5]\r\nfor n_layers in num_layers:\r\n\t# evaluate model with a given number of layers\r\n\thistory, result = evaluate_model(n_layers, trainX, trainy, testX, testy)\r\n\tprint('layers=%d: %.3f' % (n_layers, result))\r\n\t# plot learning curve\r\n\tpyplot.plot(history.history['loss'], label=str(n_layers))\r\npyplot.legend()\r\npyplot.show()<\/pre>\n<p>Tying these elements together, the complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># study of mlp learning curves given different number of layers for multi-class classification\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Dense\r\nfrom keras.optimizers import SGD\r\nfrom keras.utils import to_categorical\r\nfrom matplotlib import pyplot\r\n\r\n# prepare multi-class classification dataset\r\ndef create_dataset():\r\n\t# generate 2d classification dataset\r\n\tX, y = make_blobs(n_samples=1000, centers=20, n_features=100, cluster_std=2, random_state=2)\r\n\t# one hot encode output variable\r\n\ty = to_categorical(y)\r\n\t# split into train and test\r\n\tn_train = 500\r\n\ttrainX, testX = X[:n_train, :], X[n_train:, :]\r\n\ttrainy, testy = y[:n_train], y[n_train:]\r\n\treturn trainX, trainy, testX, testy\r\n\r\n# fit model with given number of layers, returns test set accuracy\r\ndef evaluate_model(n_layers, trainX, trainy, testX, testy):\r\n\t# configure the model based on the data\r\n\tn_input, n_classes = trainX.shape[1], testy.shape[1]\r\n\t# define model\r\n\tmodel = Sequential()\r\n\tmodel.add(Dense(10, input_dim=n_input, activation='relu', kernel_initializer='he_uniform'))\r\n\tfor _ in range(1, n_layers):\r\n\t\tmodel.add(Dense(10, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(n_classes, activation='softmax'))\r\n\t# compile model\r\n\topt = SGD(lr=0.01, momentum=0.9)\r\n\tmodel.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])\r\n\t# fit model\r\n\thistory = model.fit(trainX, trainy, epochs=100, verbose=0)\r\n\t# evaluate model on test set\r\n\t_, test_acc = model.evaluate(testX, testy, verbose=0)\r\n\treturn history, test_acc\r\n\r\n# get dataset\r\ntrainX, trainy, testX, testy = create_dataset()\r\n# evaluate model and plot learning curve of model with given number of layers\r\nall_history = list()\r\nnum_layers = [1, 2, 3, 4, 5]\r\nfor n_layers in num_layers:\r\n\t# evaluate model with a given number of layers\r\n\thistory, result = evaluate_model(n_layers, trainX, trainy, testX, testy)\r\n\tprint('layers=%d: %.3f' % (n_layers, result))\r\n\t# plot learning curve\r\n\tpyplot.plot(history.history['loss'], label=str(n_layers))\r\npyplot.legend()\r\npyplot.show()<\/pre>\n<p>Running the example first prints the test accuracy for each model configuration.<\/p>\n<p>Your specific results will vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p>In this case, we can see that the model is capable of learning the problem well with up to three layers, then begins to falter. We can see that performance really drops with five layers and is expected to continue to fall if the number of layers is increased further.<\/p>\n<pre class=\"crayon-plain-tag\">layers=1: 1.000\r\nlayers=2: 1.000\r\nlayers=3: 1.000\r\nlayers=4: 0.948\r\nlayers=5: 0.794<\/pre>\n<p>A line plot is also created showing cross entropy loss on the training dataset for each model configuration (1 to 5 layers) over the 100 training epochs.<\/p>\n<p>We can see that the dynamics of the model with 1, 2, and 3 models (blue, orange and green) are pretty similar, learning the problem quickly.<\/p>\n<p>Surprisingly, training loss with four and five layers shows signs of initially doing well, then leaping up, suggesting that the model is likely stuck with a sub-optimal set of weights rather than overfitting the training dataset.<\/p>\n<div id=\"attachment_6989\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6989\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/11\/Line-Plot-of-Cross-Entropy-Loss-Over-Training-Epochs-for-an-MLP-on-the-Training-Dataset-for-the-Blobs-Multi-Class-Classification-Problem-When-Varying-Model-Layers.png\" alt=\"Line Plot of Cross Entropy Loss Over Training Epochs for an MLP on the Training Dataset for the Blobs Multi-Class Classification Problem When Varying Model Layers\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Line-Plot-of-Cross-Entropy-Loss-Over-Training-Epochs-for-an-MLP-on-the-Training-Dataset-for-the-Blobs-Multi-Class-Classification-Problem-When-Varying-Model-Layers.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Line-Plot-of-Cross-Entropy-Loss-Over-Training-Epochs-for-an-MLP-on-the-Training-Dataset-for-the-Blobs-Multi-Class-Classification-Problem-When-Varying-Model-Layers-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Line-Plot-of-Cross-Entropy-Loss-Over-Training-Epochs-for-an-MLP-on-the-Training-Dataset-for-the-Blobs-Multi-Class-Classification-Problem-When-Varying-Model-Layers-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Line-Plot-of-Cross-Entropy-Loss-Over-Training-Epochs-for-an-MLP-on-the-Training-Dataset-for-the-Blobs-Multi-Class-Classification-Problem-When-Varying-Model-Layers-1024x768.png 1024w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p class=\"wp-caption-text\">Line Plot of Cross Entropy Loss Over Training Epochs for an MLP on the Training Dataset for the Blobs Multi-Class Classification Problem When Varying Model Layers<\/p>\n<\/div>\n<p>The analysis shows that increasing the capacity of the model via increasing depth is a very effective tool that must be used with caution as it can quickly result in a model with a large capacity that may not be capable of learning the training dataset easily.<\/p>\n<h2>Extensions<\/h2>\n<p>This section lists some ideas for extending the tutorial that you may wish to explore.<\/p>\n<ul>\n<li><strong>Too Many Nodes<\/strong>. Update the experiment of increasing nodes to find the point where the learning algorithm is no longer capable of learning the problem.<\/li>\n<li><strong>Repeated Evaluation<\/strong>. Update an experiment to use the repeated evaluation of each configuration to counter the stochastic nature of the learning algorithm.<\/li>\n<li><strong>Harder Problem<\/strong>. Repeat the experiment of increasing layers on a problem that requires the increased capacity provided by increased depth in order to perform well.<\/li>\n<\/ul>\n<p>If you explore any of these extensions, I\u2019d love to know.<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Posts<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/how-to-configure-the-number-of-layers-and-nodes-in-a-neural-network\/\">How to Configure the Number of Layers and Nodes in a Neural Network<\/a><\/li>\n<\/ul>\n<h3>Books<\/h3>\n<ul>\n<li><a href=\"https:\/\/amzn.to\/2vhyW8j\">Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks<\/a>, 1999.<\/li>\n<li><a href=\"https:\/\/amzn.to\/2IXzUIY\">Deep Learning<\/a>, 2016.<\/li>\n<\/ul>\n<h3>API<\/h3>\n<ul>\n<li><a href=\"https:\/\/keras.io\/layers\/core\/\">Keras Core Layers API<\/a><\/li>\n<li><a href=\"https:\/\/keras.io\/layers\/convolutional\/\">Keras Convolutional Layers API<\/a><\/li>\n<li><a href=\"https:\/\/keras.io\/layers\/recurrent\/\">Keras Recurrent Layers API<\/a><\/li>\n<li><a href=\"https:\/\/keras.io\/utils\/\">Keras Utility Functions<\/a><\/li>\n<li><a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_blobs.html\">sklearn.datasets.make_blobs API<\/a><\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"http:\/\/www.faqs.org\/faqs\/ai-faq\/neural-nets\/part3\/section-9.html\">How many hidden layers should I use?, comp.ai.neural-nets FAQ<\/a><\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to control the capacity of a neural network model and how capacity impacts what a model is capable of learning.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Neural network model capacity is controlled both by the number of nodes and the number of layers in the model.<\/li>\n<li>A model with a single hidden layer and a sufficient number of nodes has the capability of learning any mapping function, but the chosen learning algorithm may or may not be able to realize this capability.<\/li>\n<li>Increasing the number of layers provides a short-cut to increasing the capacity of the model with fewer resources, and modern techniques allow learning algorithms to successfully train deep models.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/how-to-control-neural-network-model-capacity-with-nodes-and-layers\/\">How to Control Neural Network Model Capacity With Nodes and Layers<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/how-to-control-neural-network-model-capacity-with-nodes-and-layers\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee The capacity of a deep learning neural network model controls the scope of the types of mapping functions that it is able [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/02\/12\/how-to-control-neural-network-model-capacity-with-nodes-and-layers\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":1715,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1714"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=1714"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1714\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/1715"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=1714"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=1714"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=1714"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}