{"id":1692,"date":"2019-02-07T18:00:26","date_gmt":"2019-02-07T18:00:26","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/02\/07\/how-to-improve-performance-with-transfer-learning-for-deep-learning-neural-networks\/"},"modified":"2019-02-07T18:00:26","modified_gmt":"2019-02-07T18:00:26","slug":"how-to-improve-performance-with-transfer-learning-for-deep-learning-neural-networks","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/02\/07\/how-to-improve-performance-with-transfer-learning-for-deep-learning-neural-networks\/","title":{"rendered":"How to Improve Performance With Transfer Learning for Deep Learning Neural Networks"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>An interesting benefit of deep learning neural networks is that they can be reused on related problems.<\/p>\n<p>Transfer learning refers to a technique for predictive modeling on a different but somehow similar problem that can then be reused partly or wholly to accelerate the training and improve the performance of a model on the problem of interest.<\/p>\n<p>In deep learning, this means reusing the weights in one or more layers from a pre-trained network model in a new model and either keeping the weights fixed, fine tuning them, or adapting the weights entirely when training the model.<\/p>\n<p>In this tutorial, you will discover how to use transfer learning to improve the performance deep learning neural networks in Python with Keras.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Transfer learning is a method for reusing a model trained on a related predictive modeling problem.<\/li>\n<li>Transfer learning can be used to accelerate the training of neural networks as either a weight initialization scheme or feature extraction method.<\/li>\n<li>How to use transfer learning to improve the performance of an MLP for a multiclass classification problem.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_6979\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6979\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/02\/How-to-Improve-Performance-With-Transfer-Learning-for-Deep-Learning-Neural-Networks.jpg\" alt=\"How to Improve Performance With Transfer Learning for Deep Learning Neural Networks\" width=\"640\" height=\"428\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/How-to-Improve-Performance-With-Transfer-Learning-for-Deep-Learning-Neural-Networks.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/How-to-Improve-Performance-With-Transfer-Learning-for-Deep-Learning-Neural-Networks-300x201.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p class=\"wp-caption-text\">How to Improve Performance With Transfer Learning for Deep Learning Neural Networks<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/23024164@N06\/13885404633\/\">Damian Gadal<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into six parts; they are:<\/p>\n<ol>\n<li>What Is Transfer Learning?<\/li>\n<li>Blobs Multi-Class Classification Problem<\/li>\n<li>Multilayer Perceptron Model for Problem 1<\/li>\n<li>Standalone MLP Model for Problem 2<\/li>\n<li>MLP With Transfer Learning for Problem 2<\/li>\n<li>Comparison of Models on Problem 2<\/li>\n<\/ol>\n<h2>What Is Transfer Learning?<\/h2>\n<p>Transfer learning generally refers to a process where a model trained on one problem is used in some way on a second related problem.<\/p>\n<blockquote>\n<p>Transfer learning and domain adaptation refer to the situation where what has been learned in one setting (i.e., distribution P1) is exploited to improve generalization in another setting (say distribution P2).<\/p>\n<\/blockquote>\n<p>\u2014 Page 536, <a href=\"https:\/\/amzn.to\/2NJW3gE\">Deep Learning<\/a>, 2016.<\/p>\n<p>In deep learning, transfer learning is a technique whereby a neural network model is first trained on a problem similar to the problem that is being solved. One or more layers from the trained model are then used in a new model trained on the problem of interest.<\/p>\n<blockquote>\n<p>This is typically understood in a supervised learning context, where the input is the same but the target may be of a different nature. For example, we may learn about one set of visual categories, such as cats and dogs, in the first setting, then learn about a different set of visual categories, such as ants and wasps, in the second setting.<\/p>\n<\/blockquote>\n<p>\u2014 Page 536, <a href=\"https:\/\/amzn.to\/2NJW3gE\">Deep Learning<\/a>, 2016.<\/p>\n<p>Transfer learning has the benefit of decreasing the training time for a neural network model and resulting in lower generalization error.<\/p>\n<p>There are two main approaches to implementing transfer learning; they are:<\/p>\n<ul>\n<li>Weight Initialization.<\/li>\n<li>Feature Extraction.<\/li>\n<\/ul>\n<p>The weights in re-used layers may be used as the starting point for the training process and adapted in response to the new problem. This usage treats transfer learning as a type of weight initialization scheme. This may be useful when the first related problem has a lot more labeled data than the problem of interest and the similarity in the structure of the problem may be useful in both contexts.<\/p>\n<blockquote>\n<p>\u2026 the objective is to take advantage of data from the first setting to extract information that may be useful when learning or even when directly making predictions in the second setting.<\/p>\n<\/blockquote>\n<p>\u2014 Page 538, <a href=\"https:\/\/amzn.to\/2NJW3gE\">Deep Learning<\/a>, 2016.<\/p>\n<p>Alternately, the weights of the network may not be adapted in response to the new problem, and only new layers after the reused layers may be trained to interpret their output. This usage treats transfer learning as a type of feature extraction scheme. An example of this approach is the re-use of deep convolutional neural network models trained for photo classification as feature extractors when developing <a href=\"https:\/\/machinelearningmastery.com\/develop-a-deep-learning-caption-generation-model-in-python\/\">photo captioning models<\/a>.<\/p>\n<p>Variations on these usages may involve not training the weights of the model on the new problem initially, but later fine tuning all weights of the learned model with a small learning rate.<\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<p><center><\/p>\n<h3>Want Better Results with Deep Learning?<\/h3>\n<p>Take my free 7-day email crash course now (with sample code).<\/p>\n<p>Click to sign-up and also get a free PDF Ebook version of the course.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1433e7773f72a2%3A164f8be4f346dc\/5764144745676800\/\" target=\"_blank\" style=\"background: rgb(255, 206, 10); color: rgb(255, 255, 255); text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-weight: bold; font-size: 16px; line-height: 20px; padding: 10px; display: inline-block; max-width: 300px; border-radius: 5px; text-shadow: rgba(0, 0, 0, 0.25) 0px -1px 1px; box-shadow: rgba(255, 255, 255, 0.5) 0px 1px 3px inset, rgba(0, 0, 0, 0.5) 0px 1px 3px;\">Download Your FREE Mini-Course<\/a><script data-leadbox=\"1433e7773f72a2:164f8be4f346dc\" data-url=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1433e7773f72a2%3A164f8be4f346dc\/5764144745676800\/\" data-config=\"%7B%7D\" type=\"text\/javascript\" src=\"https:\/\/machinelearningmastery.lpages.co\/leadbox-1543333086.js\"><\/script><\/p>\n<p><\/center><\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<h2>Blobs Multi-Class Classification Problem<\/h2>\n<p>We will use a small multi-class classification problem as the basis to demonstrate transfer learning.<\/p>\n<p>The scikit-learn class provides the <a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_blobs.html\">make_blobs() function<\/a> that can be used to create a multi-class classification problem with the prescribed number of samples, input variables, classes, and variance of samples within a class.<\/p>\n<p>We can configure the problem to have two input variables (to represent the <em>x<\/em> and <em>y<\/em> coordinates of the points) and a standard deviation of 2.0 for points within each group. We will use the same random state (seed for the pseudorandom number generator) to ensure that we always get the same data points.<\/p>\n<pre class=\"crayon-plain-tag\"># generate 2d classification dataset\r\nX, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=1)<\/pre>\n<p>The results are the input and output elements of a dataset that we can model.<\/p>\n<p>The \u201c<em>random_state<\/em>\u201d argument can be varied to give different versions of the problem (different cluster centers). We can use this to generate samples from two different problems: train a model on one problem and re-use the weights to better learn a model for a second problem.<\/p>\n<p>Specifically, we will refer to <em>random_state=1<\/em> as Problem 1 and <em>random_state=2<\/em> as Problem 2.<\/p>\n<ul>\n<li><strong>Problem 1<\/strong>. Blobs problem with two input variables and three classes with the <em>random_state<\/em> argument set to one.<\/li>\n<li><strong>Problem 2<\/strong>. Blobs problem with two input variables and three classes with the <em>random_state<\/em> argument set to two.<\/li>\n<\/ul>\n<p>In order to get a feeling for the complexity of the problem, we can plot each point on a two-dimensional scatter plot and color each point by class value.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># plot of blobs multiclass classification problems 1 and 2\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom numpy import where\r\nfrom matplotlib import pyplot\r\n\r\n# generate samples for blobs problem with a given random seed\r\ndef samples_for_seed(seed):\r\n\t# generate samples\r\n\tX, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=seed)\r\n\treturn X, y\r\n\r\n# create a scatter plot of points colored by class value\r\ndef plot_samples(X, y, classes=3):\r\n\t# plot points for each class\r\n\tfor i in range(classes):\r\n\t\t# select indices of points with each class label\r\n\t\tsamples_ix = where(y == i)\r\n\t\t# plot points for this class with a given color\r\n\t\tpyplot.scatter(X[samples_ix, 0], X[samples_ix, 1])\r\n\r\n# generate multiple problems\r\nn_problems = 2\r\nfor i in range(1, n_problems+1):\r\n\t# specify subplot\r\n\tpyplot.subplot(210 + i)\r\n\t# generate samples\r\n\tX, y = samples_for_seed(i)\r\n\t# scatter plot of samples\r\n\tplot_samples(X, y)\r\n# plot figure\r\npyplot.show()<\/pre>\n<p>Running the example generates a sample of 1,000 examples for Problem 1 and Problem 2 and creates a scatter plot for each sample, coloring the data points by their class value.<\/p>\n<div id=\"attachment_6974\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6974\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/11\/Scatter-Plots-of-Blobs-Dataset-for-Problems-1-and-2-with-Three-Classes-and-Points-Colored-by-Class-Value.png\" alt=\"Scatter Plots of Blobs Dataset for Problems 1 and 2 With Three Classes and Points Colored by Class Value\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Scatter-Plots-of-Blobs-Dataset-for-Problems-1-and-2-with-Three-Classes-and-Points-Colored-by-Class-Value.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Scatter-Plots-of-Blobs-Dataset-for-Problems-1-and-2-with-Three-Classes-and-Points-Colored-by-Class-Value-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Scatter-Plots-of-Blobs-Dataset-for-Problems-1-and-2-with-Three-Classes-and-Points-Colored-by-Class-Value-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Scatter-Plots-of-Blobs-Dataset-for-Problems-1-and-2-with-Three-Classes-and-Points-Colored-by-Class-Value-1024x768.png 1024w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p class=\"wp-caption-text\">Scatter Plots of Blobs Dataset for Problems 1 and 2 With Three Classes and Points Colored by Class Value<\/p>\n<\/div>\n<p>This provides a good basis for transfer learning as each version of the problem has similar input data with a similar scale, although with different target information (e.g. cluster centers).<\/p>\n<p>We would expect that aspects of a model fit on one version of the blobs problem (e.g. Problem 1) to be useful when fitting a model on a new version of the blobs problem (e.g. Problem 2).<\/p>\n<h2>Multilayer Perceptron Model for Problem 1<\/h2>\n<p>In this section, we will develop a Multilayer Perceptron model (MLP) for Problem 1 and save the model to file so that we can reuse the weights later.<\/p>\n<p>First, we will develop a function to prepare the dataset ready for modeling. After the make_blobs() function is called with a given random seed (e.g, one in this case for Problem 1), the target variable must be one hot encoded so that we can develop a model that predicts the probability of a given sample belonging to each of the target classes.<\/p>\n<p>The prepared samples can then be split in half, with 500 examples for both the train and test datasets. The <em>samples_for_seed()<\/em> function below implements this, preparing the dataset for a given random number seed and re-tuning the train and test sets split into input and output components.<\/p>\n<pre class=\"crayon-plain-tag\"># prepare a blobs examples with a given random seed\r\ndef samples_for_seed(seed):\r\n\t# generate samples\r\n\tX, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=seed)\r\n\t# one hot encode output variable\r\n\ty = to_categorical(y)\r\n\t# split into train and test\r\n\tn_train = 500\r\n\ttrainX, testX = X[:n_train, :], X[n_train:, :]\r\n\ttrainy, testy = y[:n_train], y[n_train:]\r\n\treturn trainX, trainy, testX, testy<\/pre>\n<p>We can call this function to prepare a dataset for Problem 1 as follows.<\/p>\n<pre class=\"crayon-plain-tag\"># prepare data\r\ntrainX, trainy, testX, testy = samples_for_seed(1)<\/pre>\n<p>Next, we can define and fit a model on the training dataset.<\/p>\n<p>The model will expect two inputs for the two variables in the data. The model will have two hidden layers with five nodes each and the rectified linear activation function. Two layers are probably not required for this function, although we\u2019re interested in the model learning some deep structure that we can reuse across instances of this problem. The output layer has three nodes, one for each class in the target variable and the softmax activation function.<\/p>\n<pre class=\"crayon-plain-tag\"># define model\r\nmodel = Sequential()\r\nmodel.add(Dense(5, input_dim=2, activation='relu', kernel_initializer='he_uniform'))\r\nmodel.add(Dense(5, activation='relu', kernel_initializer='he_uniform'))\r\nmodel.add(Dense(3, activation='softmax'))<\/pre>\n<p>Given that the problem is a multi-class classification problem, the categorical cross-entropy loss function is minimized and the stochastic gradient descent with the default learning rate and no momentum is used to learn the problem.<\/p>\n<pre class=\"crayon-plain-tag\"># compile model\r\nmodel.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])<\/pre>\n<p>The model is fit for 100 epochs on the training dataset and the test set is used as a validation dataset during training, evaluating the performance on both datasets at the end of each epoch so that we can plot learning curves.<\/p>\n<pre class=\"crayon-plain-tag\">history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)<\/pre>\n<p>The <em>fit_model()<\/em> function ties these elements together, taking the train and test datasets as arguments and returning the fit model and training history.<\/p>\n<pre class=\"crayon-plain-tag\"># define and fit model on a training dataset\r\ndef fit_model(trainX, trainy, testX, testy):\r\n\t# define model\r\n\tmodel = Sequential()\r\n\tmodel.add(Dense(5, input_dim=2, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(5, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(3, activation='softmax'))\r\n\t# compile model\r\n\tmodel.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])\r\n\t# fit model\r\n\thistory = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)\r\n\treturn model, history<\/pre>\n<p>We can call this function with the prepared dataset to obtain a fit model and the history collected during the training process.<\/p>\n<pre class=\"crayon-plain-tag\"># fit model on train dataset\r\nmodel, history = fit_model(trainX, trainy, testX, testy)<\/pre>\n<p>Finally, we can summarize the performance of the model.<\/p>\n<p>The classification accuracy of the model on the train and test sets can be evaluated.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate the model\r\n_, train_acc = model.evaluate(trainX, trainy, verbose=0)\r\n_, test_acc = model.evaluate(testX, testy, verbose=0)\r\nprint('Train: %.3f, Test: %.3f' % (train_acc, test_acc))<\/pre>\n<p>The history collected during training can be used to create line plots showing both the loss and classification accuracy for the model on the train and test sets over each training epoch, providing learning curves.<\/p>\n<pre class=\"crayon-plain-tag\"># plot loss during training\r\npyplot.subplot(211)\r\npyplot.title('Loss')\r\npyplot.plot(history.history['loss'], label='train')\r\npyplot.plot(history.history['val_loss'], label='test')\r\npyplot.legend()\r\n# plot accuracy during training\r\npyplot.subplot(212)\r\npyplot.title('Accuracy')\r\npyplot.plot(history.history['acc'], label='train')\r\npyplot.plot(history.history['val_acc'], label='test')\r\npyplot.legend()\r\npyplot.show()<\/pre>\n<p>The <em>summarize_model()<\/em> function below implements this, taking the fit model, training history, and dataset as arguments and printing the model performance and creating a plot of model learning curves.<\/p>\n<pre class=\"crayon-plain-tag\"># summarize the performance of the fit model\r\ndef summarize_model(model, history, trainX, trainy, testX, testy):\r\n\t# evaluate the model\r\n\t_, train_acc = model.evaluate(trainX, trainy, verbose=0)\r\n\t_, test_acc = model.evaluate(testX, testy, verbose=0)\r\n\tprint('Train: %.3f, Test: %.3f' % (train_acc, test_acc))\r\n\t# plot loss during training\r\n\tpyplot.subplot(211)\r\n\tpyplot.title('Loss')\r\n\tpyplot.plot(history.history['loss'], label='train')\r\n\tpyplot.plot(history.history['val_loss'], label='test')\r\n\tpyplot.legend()\r\n\t# plot accuracy during training\r\n\tpyplot.subplot(212)\r\n\tpyplot.title('Accuracy')\r\n\tpyplot.plot(history.history['acc'], label='train')\r\n\tpyplot.plot(history.history['val_acc'], label='test')\r\n\tpyplot.legend()\r\n\tpyplot.show()<\/pre>\n<p>We can call this function with the fit model and prepared data.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate model behavior\r\nsummarize_model(model, history, trainX, trainy, testX, testy)<\/pre>\n<p>At the end of the run, we can save the model to file so that we may load it later and use it as the basis for some transfer learning experiments.<\/p>\n<p>Note that saving the model to file requires that you have the <em>h5py<\/em> library installed. This library can be installed via <em>pip<\/em> as follows:<\/p>\n<pre class=\"crayon-plain-tag\">sudo pip install h5py<\/pre>\n<p>The fit model can be saved by calling the <em>save()<\/em> function on the model.<\/p>\n<pre class=\"crayon-plain-tag\"># save model to file\r\nmodel.save('model.h5')<\/pre>\n<p>Tying these elements together, the complete example of fitting an MLP on Problem 1, summarizing the model\u2019s performance, and saving the model to file is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># fit mlp model on problem 1 and save model to file\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom keras.layers import Dense\r\nfrom keras.models import Sequential\r\nfrom keras.optimizers import SGD\r\nfrom keras.utils import to_categorical\r\nfrom matplotlib import pyplot\r\n\r\n# prepare a blobs examples with a given random seed\r\ndef samples_for_seed(seed):\r\n\t# generate samples\r\n\tX, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=seed)\r\n\t# one hot encode output variable\r\n\ty = to_categorical(y)\r\n\t# split into train and test\r\n\tn_train = 500\r\n\ttrainX, testX = X[:n_train, :], X[n_train:, :]\r\n\ttrainy, testy = y[:n_train], y[n_train:]\r\n\treturn trainX, trainy, testX, testy\r\n\r\n# define and fit model on a training dataset\r\ndef fit_model(trainX, trainy, testX, testy):\r\n\t# define model\r\n\tmodel = Sequential()\r\n\tmodel.add(Dense(5, input_dim=2, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(5, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(3, activation='softmax'))\r\n\t# compile model\r\n\tmodel.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])\r\n\t# fit model\r\n\thistory = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)\r\n\treturn model, history\r\n\r\n# summarize the performance of the fit model\r\ndef summarize_model(model, history, trainX, trainy, testX, testy):\r\n\t# evaluate the model\r\n\t_, train_acc = model.evaluate(trainX, trainy, verbose=0)\r\n\t_, test_acc = model.evaluate(testX, testy, verbose=0)\r\n\tprint('Train: %.3f, Test: %.3f' % (train_acc, test_acc))\r\n\t# plot loss during training\r\n\tpyplot.subplot(211)\r\n\tpyplot.title('Loss')\r\n\tpyplot.plot(history.history['loss'], label='train')\r\n\tpyplot.plot(history.history['val_loss'], label='test')\r\n\tpyplot.legend()\r\n\t# plot accuracy during training\r\n\tpyplot.subplot(212)\r\n\tpyplot.title('Accuracy')\r\n\tpyplot.plot(history.history['acc'], label='train')\r\n\tpyplot.plot(history.history['val_acc'], label='test')\r\n\tpyplot.legend()\r\n\tpyplot.show()\r\n\r\n# prepare data\r\ntrainX, trainy, testX, testy = samples_for_seed(1)\r\n# fit model on train dataset\r\nmodel, history = fit_model(trainX, trainy, testX, testy)\r\n# evaluate model behavior\r\nsummarize_model(model, history, trainX, trainy, testX, testy)\r\n# save model to file\r\nmodel.save('model.h5')<\/pre>\n<p>Running the example fits and evaluates the performance of the model, printing the classification accuracy on the train and test sets.<\/p>\n<p>Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p>In this case, we can see that the model performed well on Problem 1, achieving a classification accuracy of about 92% on both the train and test datasets.<\/p>\n<pre class=\"crayon-plain-tag\">Train: 0.916, Test: 0.920<\/pre>\n<p>A figure is also created summarizing the learning curves of the model, showing both the loss (top) and accuracy (bottom) for the model on both the train (blue) and test (orange) datasets at the end of each training epoch.<\/p>\n<p>Your plot may not look identical but is expected to show the same general behavior. If not, try running the example a few times.<\/p>\n<p>In this case, we can see that the model learned the problem reasonably quickly and well, perhaps converging in about 40 epochs and remaining reasonably stable on both datasets.<\/p>\n<div id=\"attachment_6975\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6975\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/11\/Loss-and-Accuracy-Learning-Curves-on-the-Train-and-Test-Sets-for-an-MLP-on-Problem-1.png\" alt=\"Loss and Accuracy Learning Curves on the Train and Test Sets for an MLP on Problem 1\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Loss-and-Accuracy-Learning-Curves-on-the-Train-and-Test-Sets-for-an-MLP-on-Problem-1.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Loss-and-Accuracy-Learning-Curves-on-the-Train-and-Test-Sets-for-an-MLP-on-Problem-1-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Loss-and-Accuracy-Learning-Curves-on-the-Train-and-Test-Sets-for-an-MLP-on-Problem-1-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Loss-and-Accuracy-Learning-Curves-on-the-Train-and-Test-Sets-for-an-MLP-on-Problem-1-1024x768.png 1024w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p class=\"wp-caption-text\">Loss and Accuracy Learning Curves on the Train and Test Sets for an MLP on Problem 1<\/p>\n<\/div>\n<p>Now that we have seen how to develop a standalone MLP for the blobs Problem 1, we can look at the doing the same for Problem 2 that can be used as a baseline.<\/p>\n<h2>Standalone MLP Model for Problem 2<\/h2>\n<p>The example in the previous section can be updated to fit an MLP model to Problem 2.<\/p>\n<p>It is important to get an idea of performance and learning dynamics on Problem 2 for a standalone model first as this will provide a baseline in performance that can be used to compare to a model fit on the same problem using transfer learning.<\/p>\n<p>A single change is required that changes the call to <em>samples_for_seed()<\/em> to use the pseudorandom number generator seed of two instead of one.<\/p>\n<pre class=\"crayon-plain-tag\"># prepare data\r\ntrainX, trainy, testX, testy = samples_for_seed(2)<\/pre>\n<p>For completeness, the full example with this change is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># fit mlp model on problem 2 and save model to file\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom keras.layers import Dense\r\nfrom keras.models import Sequential\r\nfrom keras.optimizers import SGD\r\nfrom keras.utils import to_categorical\r\nfrom matplotlib import pyplot\r\n\r\n# prepare a blobs examples with a given random seed\r\ndef samples_for_seed(seed):\r\n\t# generate samples\r\n\tX, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=seed)\r\n\t# one hot encode output variable\r\n\ty = to_categorical(y)\r\n\t# split into train and test\r\n\tn_train = 500\r\n\ttrainX, testX = X[:n_train, :], X[n_train:, :]\r\n\ttrainy, testy = y[:n_train], y[n_train:]\r\n\treturn trainX, trainy, testX, testy\r\n\r\n# define and fit model on a training dataset\r\ndef fit_model(trainX, trainy, testX, testy):\r\n\t# define model\r\n\tmodel = Sequential()\r\n\tmodel.add(Dense(5, input_dim=2, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(5, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(3, activation='softmax'))\r\n\t# compile model\r\n\tmodel.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])\r\n\t# fit model\r\n\thistory = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)\r\n\treturn model, history\r\n\r\n# summarize the performance of the fit model\r\ndef summarize_model(model, history, trainX, trainy, testX, testy):\r\n\t# evaluate the model\r\n\t_, train_acc = model.evaluate(trainX, trainy, verbose=0)\r\n\t_, test_acc = model.evaluate(testX, testy, verbose=0)\r\n\tprint('Train: %.3f, Test: %.3f' % (train_acc, test_acc))\r\n\t# plot loss during training\r\n\tpyplot.subplot(211)\r\n\tpyplot.title('Loss')\r\n\tpyplot.plot(history.history['loss'], label='train')\r\n\tpyplot.plot(history.history['val_loss'], label='test')\r\n\tpyplot.legend()\r\n\t# plot accuracy during training\r\n\tpyplot.subplot(212)\r\n\tpyplot.title('Accuracy')\r\n\tpyplot.plot(history.history['acc'], label='train')\r\n\tpyplot.plot(history.history['val_acc'], label='test')\r\n\tpyplot.legend()\r\n\tpyplot.show()\r\n\r\n# prepare data\r\ntrainX, trainy, testX, testy = samples_for_seed(2)\r\n# fit model on train dataset\r\nmodel, history = fit_model(trainX, trainy, testX, testy)\r\n# evaluate model behavior\r\nsummarize_model(model, history, trainX, trainy, testX, testy)<\/pre>\n<p>Running the example fits and evaluates the performance of the model, printing the classification accuracy on the train and test sets.<\/p>\n<p>Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p>In this case, we can see that the model performed okay on Problem 2, but not as well as was seen on Problem 1, achieving a classification accuracy of about 79% on both the train and test datasets.<\/p>\n<pre class=\"crayon-plain-tag\">Train: 0.794, Test: 0.794<\/pre>\n<p>A figure is also created summarizing the learning curves of the model. Your plot may not look identical but is expected to show the same general behavior. If not, try running the example a few times.<\/p>\n<p>In this case, we can see that the model converged more slowly than we saw on Problem 1 in the previous section. This suggests that this version of the problem may be slightly more challenging, at least for the chosen model configuration.<\/p>\n<div id=\"attachment_6976\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6976\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/11\/Loss-and-Accuracy-Learning-Curves-on-the-Train-and-Test-Sets-for-an-MLP-on-Problem-2.png\" alt=\"Loss and Accuracy Learning Curves on the Train and Test Sets for an MLP on Problem 2\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Loss-and-Accuracy-Learning-Curves-on-the-Train-and-Test-Sets-for-an-MLP-on-Problem-2.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Loss-and-Accuracy-Learning-Curves-on-the-Train-and-Test-Sets-for-an-MLP-on-Problem-2-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Loss-and-Accuracy-Learning-Curves-on-the-Train-and-Test-Sets-for-an-MLP-on-Problem-2-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Loss-and-Accuracy-Learning-Curves-on-the-Train-and-Test-Sets-for-an-MLP-on-Problem-2-1024x768.png 1024w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p class=\"wp-caption-text\">Loss and Accuracy Learning Curves on the Train and Test Sets for an MLP on Problem 2<\/p>\n<\/div>\n<p>Now that we have a baseline of performance and learning dynamics for an MLP on Problem 2, we can see how the addition of transfer learning affects the MLP on this problem.<\/p>\n<h2>MLP With Transfer Learning for Problem 2<\/h2>\n<p>The model that was fit on Problem 1 can be loaded and the weights can be used as the initial weights for a model fit on Problem 2.<\/p>\n<p>This is a type of transfer learning where learning on a different but related problem is used as a type of weight initialization scheme.<\/p>\n<p>This requires that the <em>fit_model()<\/em> function be updated to load the model and refit it on examples for Problem 2.<\/p>\n<p>The model saved in \u2018model.h5\u2019 can be loaded using the <em>load_model()<\/em> Keras function.<\/p>\n<pre class=\"crayon-plain-tag\"># load model\r\nmodel = load_model('model.h5')<\/pre>\n<p>Once loaded, the model can be compiled and fit as per normal.<\/p>\n<p>The updated <em>fit_model()<\/em> with this change is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># load and re-fit model on a training dataset\r\ndef fit_model(trainX, trainy, testX, testy):\r\n\t# load model\r\n\tmodel = load_model('model.h5')\r\n\t# compile model\r\n\tmodel.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])\r\n\t# re-fit model\r\n\thistory = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)\r\n\treturn model, history<\/pre>\n<p>We would expect that a model that uses the weights from a model fit on a different but related problem to learn the problem perhaps faster in terms of the learning curve and perhaps result in lower generalization error, although these aspects would be dependent on the choice of problems and model.<\/p>\n<p>For completeness, the full example with this change is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># transfer learning with mlp model on problem 2\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom keras.layers import Dense\r\nfrom keras.models import Sequential\r\nfrom keras.optimizers import SGD\r\nfrom keras.utils import to_categorical\r\nfrom keras.models import load_model\r\nfrom matplotlib import pyplot\r\n\r\n# prepare a blobs examples with a given random seed\r\ndef samples_for_seed(seed):\r\n\t# generate samples\r\n\tX, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=seed)\r\n\t# one hot encode output variable\r\n\ty = to_categorical(y)\r\n\t# split into train and test\r\n\tn_train = 500\r\n\ttrainX, testX = X[:n_train, :], X[n_train:, :]\r\n\ttrainy, testy = y[:n_train], y[n_train:]\r\n\treturn trainX, trainy, testX, testy\r\n\r\n# load and re-fit model on a training dataset\r\ndef fit_model(trainX, trainy, testX, testy):\r\n\t# load model\r\n\tmodel = load_model('model.h5')\r\n\t# compile model\r\n\tmodel.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])\r\n\t# re-fit model\r\n\thistory = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)\r\n\treturn model, history\r\n\r\n# summarize the performance of the fit model\r\ndef summarize_model(model, history, trainX, trainy, testX, testy):\r\n\t# evaluate the model\r\n\t_, train_acc = model.evaluate(trainX, trainy, verbose=0)\r\n\t_, test_acc = model.evaluate(testX, testy, verbose=0)\r\n\tprint('Train: %.3f, Test: %.3f' % (train_acc, test_acc))\r\n\t# plot loss during training\r\n\tpyplot.subplot(211)\r\n\tpyplot.title('Loss')\r\n\tpyplot.plot(history.history['loss'], label='train')\r\n\tpyplot.plot(history.history['val_loss'], label='test')\r\n\tpyplot.legend()\r\n\t# plot accuracy during training\r\n\tpyplot.subplot(212)\r\n\tpyplot.title('Accuracy')\r\n\tpyplot.plot(history.history['acc'], label='train')\r\n\tpyplot.plot(history.history['val_acc'], label='test')\r\n\tpyplot.legend()\r\n\tpyplot.show()\r\n\r\n# prepare data\r\ntrainX, trainy, testX, testy = samples_for_seed(2)\r\n# fit model on train dataset\r\nmodel, history = fit_model(trainX, trainy, testX, testy)\r\n# evaluate model behavior\r\nsummarize_model(model, history, trainX, trainy, testX, testy)<\/pre>\n<p>Running the example fits and evaluates the performance of the model, printing the classification accuracy on the train and test sets.<\/p>\n<p>Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p>In this case, we can see that the model achieved a lower generalization error, achieving an accuracy of about 81% on the test dataset for Problem 2 as compared to the standalone model that achieved about 79% accuracy.<\/p>\n<pre class=\"crayon-plain-tag\">Train: 0.786, Test: 0.810<\/pre>\n<p>A figure is also created summarizing the learning curves of the model. Your plot may not look identical but is expected to show the same general behavior. If not, try running the example a few times.<\/p>\n<p>In this case, we can see that the model does appear to have a similar learning curve, although we do see apparent improvements in the learning curve for the test set (orange line) both in terms of better performance earlier (epoch 20 onward) and above the performance of the model on the training set.<\/p>\n<div id=\"attachment_6977\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6977\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/11\/Loss-and-Accuracy-Learning-Curves-on-the-Train-and-Test-Sets-for-an-MLP-with-Transfer-Learning-on-Problem-2.png\" alt=\"Loss and Accuracy Learning Curves on the Train and Test Sets for an MLP With Transfer Learning on Problem 2\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Loss-and-Accuracy-Learning-Curves-on-the-Train-and-Test-Sets-for-an-MLP-with-Transfer-Learning-on-Problem-2.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Loss-and-Accuracy-Learning-Curves-on-the-Train-and-Test-Sets-for-an-MLP-with-Transfer-Learning-on-Problem-2-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Loss-and-Accuracy-Learning-Curves-on-the-Train-and-Test-Sets-for-an-MLP-with-Transfer-Learning-on-Problem-2-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Loss-and-Accuracy-Learning-Curves-on-the-Train-and-Test-Sets-for-an-MLP-with-Transfer-Learning-on-Problem-2-1024x768.png 1024w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p class=\"wp-caption-text\">Loss and Accuracy Learning Curves on the Train and Test Sets for an MLP With Transfer Learning on Problem 2<\/p>\n<\/div>\n<p>We have only looked at single runs of a standalone MLP model and an MLP with transfer learning.<\/p>\n<p>Neural network algorithms are stochastic, therefore an average of performance across multiple runs is required to see if the observed behavior is real or a statistical fluke.<\/p>\n<h2>Comparison of Models on Problem 2<\/h2>\n<p>In order to determine whether using transfer learning for the blobs multi-class classification problem has a real effect, we must repeat each experiment multiple times and analyze the average performance across the repeats.<\/p>\n<p>We will compare the performance of the standalone model trained on Problem 2 to a model using transfer learning, averaged over 30 repeats.<\/p>\n<p>Further, we will investigate whether keeping the weights in some of the layers fixed improves model performance.<\/p>\n<p>The model trained on Problem 1 has two hidden layers. By keeping the first or the first and second hidden layers fixed, the layers with unchangeable weights will act as a feature extractor and may provide features that make learning Problem 2 easier, affecting the speed of learning and\/or the accuracy of the model on the test set.<\/p>\n<p>As the first step, we will simplify the <em>fit_model()<\/em>\u00a0function to fit the model and discard any training history so that we can focus on the final accuracy of the trained model.<\/p>\n<pre class=\"crayon-plain-tag\"># define and fit model on a training dataset\r\ndef fit_model(trainX, trainy):\r\n\t# define model\r\n\tmodel = Sequential()\r\n\tmodel.add(Dense(5, input_dim=2, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(5, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(3, activation='softmax'))\r\n\t# compile model\r\n\tmodel.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])\r\n\t# fit model\r\n\tmodel.fit(trainX, trainy, epochs=100, verbose=0)\r\n\treturn model<\/pre>\n<p>Next, we can develop a function that will repeatedly fit a new standalone model on Problem 2 on the training dataset and evaluate accuracy on the test set.<\/p>\n<p>The <em>eval_standalone_model()<\/em> function below implements this, taking the train and test sets as arguments as well as the number of repeats and returns a list of accuracy scores for models on the test dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># repeated evaluation of a standalone model\r\ndef eval_standalone_model(trainX, trainy, testX, testy, n_repeats):\r\n\tscores = list()\r\n\tfor _ in range(n_repeats):\r\n\t\t# define and fit a new model on the train dataset\r\n\t\tmodel = fit_model(trainX, trainy)\r\n\t\t# evaluate model on test dataset\r\n\t\t_, test_acc = model.evaluate(testX, testy, verbose=0)\r\n\t\tscores.append(test_acc)\r\n\treturn scores<\/pre>\n<p>Summarizing the distribution of accuracy scores returned from this function will give an idea of how well the chosen standalone model performs on Problem 2.<\/p>\n<pre class=\"crayon-plain-tag\"># repeated evaluation of standalone model\r\nstandalone_scores = eval_standalone_model(trainX, trainy, testX, testy, n_repeats)\r\nprint('Standalone %.3f (%.3f)' % (mean(standalone_scores), std(standalone_scores)))<\/pre>\n<p>Next, we need an equivalent function for evaluating a model using transfer learning.<\/p>\n<p>In each loop, the model trained on Problem 1 must be loaded from file, fit on the training dataset for Problem 2, then evaluated on the test set for Problem 2.<\/p>\n<p>In addition, we will configure 0, 1, or 2 of the hidden layers in the loaded model to remain fixed. Keeping 0 hidden layers fixed means that all of the weights in the model will be adapted when learning Problem 2, using transfer learning as a weight initialization scheme. Whereas, keeping both (2) of the hidden layers fixed means that only the output layer of the model will be adapted during training, using transfer learning as a feature extraction method.<\/p>\n<p>The <em>eval_transfer_model()<\/em> function below implements this, taking the train and test datasets for Problem 2 as arguments as well as the number of hidden layers in the loaded model to keep fixed and the number of times to repeat the experiment.<\/p>\n<p>The function returns a list of test accuracy scores and summarizing this distribution will give a reasonable idea of how well the model with the chosen type of transfer learning performs on Problem 2.<\/p>\n<pre class=\"crayon-plain-tag\"># repeated evaluation of a model with transfer learning\r\ndef eval_transfer_model(trainX, trainy, testX, testy, n_fixed, n_repeats):\r\n\tscores = list()\r\n\tfor _ in range(n_repeats):\r\n\t\t# load model\r\n\t\tmodel = load_model('model.h5')\r\n\t\t# mark layer weights as fixed or not trainable\r\n\t\tfor i in range(n_fixed):\r\n\t\t\tmodel.layers[i].trainable = False\r\n\t\t# re-compile model\r\n\t\tmodel.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])\r\n\t\t# fit model on train dataset\r\n\t\tmodel.fit(trainX, trainy, epochs=100, verbose=0)\r\n\t\t# evaluate model on test dataset\r\n\t\t_, test_acc = model.evaluate(testX, testy, verbose=0)\r\n\t\tscores.append(test_acc)\r\n\treturn scores<\/pre>\n<p>We can call this function repeatedly, setting n_fixed to 0, 1, 2 in a loop and summarizing performance as we go; for example:<\/p>\n<pre class=\"crayon-plain-tag\"># repeated evaluation of transfer learning model, vary fixed layers\r\nn_fixed = 3\r\nfor i in range(n_fixed):\r\n\tscores = eval_transfer_model(trainX, trainy, testX, testy, i, n_repeats)\r\n\tprint('Transfer (fixed=%d) %.3f (%.3f)' % (i, mean(scores), std(scores)))<\/pre>\n<p>In addition to reporting the mean and standard deviation of each model, we can collect all scores and create a box and whisker plot to summarize and compare the distributions of model scores.<\/p>\n<p>Tying all of the these elements together, the complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># compare standalone mlp model performance to transfer learning\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom keras.layers import Dense\r\nfrom keras.models import Sequential\r\nfrom keras.optimizers import SGD\r\nfrom keras.utils import to_categorical\r\nfrom keras.models import load_model\r\nfrom matplotlib import pyplot\r\nfrom numpy import mean\r\nfrom numpy import std\r\n\r\n# prepare a blobs examples with a given random seed\r\ndef samples_for_seed(seed):\r\n\t# generate samples\r\n\tX, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=seed)\r\n\t# one hot encode output variable\r\n\ty = to_categorical(y)\r\n\t# split into train and test\r\n\tn_train = 500\r\n\ttrainX, testX = X[:n_train, :], X[n_train:, :]\r\n\ttrainy, testy = y[:n_train], y[n_train:]\r\n\treturn trainX, trainy, testX, testy\r\n\r\n# define and fit model on a training dataset\r\ndef fit_model(trainX, trainy):\r\n\t# define model\r\n\tmodel = Sequential()\r\n\tmodel.add(Dense(5, input_dim=2, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(5, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(3, activation='softmax'))\r\n\t# compile model\r\n\tmodel.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])\r\n\t# fit model\r\n\tmodel.fit(trainX, trainy, epochs=100, verbose=0)\r\n\treturn model\r\n\r\n# repeated evaluation of a standalone model\r\ndef eval_standalone_model(trainX, trainy, testX, testy, n_repeats):\r\n\tscores = list()\r\n\tfor _ in range(n_repeats):\r\n\t\t# define and fit a new model on the train dataset\r\n\t\tmodel = fit_model(trainX, trainy)\r\n\t\t# evaluate model on test dataset\r\n\t\t_, test_acc = model.evaluate(testX, testy, verbose=0)\r\n\t\tscores.append(test_acc)\r\n\treturn scores\r\n\r\n# repeated evaluation of a model with transfer learning\r\ndef eval_transfer_model(trainX, trainy, testX, testy, n_fixed, n_repeats):\r\n\tscores = list()\r\n\tfor _ in range(n_repeats):\r\n\t\t# load model\r\n\t\tmodel = load_model('model.h5')\r\n\t\t# mark layer weights as fixed or not trainable\r\n\t\tfor i in range(n_fixed):\r\n\t\t\tmodel.layers[i].trainable = False\r\n\t\t# re-compile model\r\n\t\tmodel.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])\r\n\t\t# fit model on train dataset\r\n\t\tmodel.fit(trainX, trainy, epochs=100, verbose=0)\r\n\t\t# evaluate model on test dataset\r\n\t\t_, test_acc = model.evaluate(testX, testy, verbose=0)\r\n\t\tscores.append(test_acc)\r\n\treturn scores\r\n\r\n# prepare data for problem 2\r\ntrainX, trainy, testX, testy = samples_for_seed(2)\r\nn_repeats = 30\r\ndists, dist_labels = list(), list()\r\n\r\n# repeated evaluation of standalone model\r\nstandalone_scores = eval_standalone_model(trainX, trainy, testX, testy, n_repeats)\r\nprint('Standalone %.3f (%.3f)' % (mean(standalone_scores), std(standalone_scores)))\r\ndists.append(standalone_scores)\r\ndist_labels.append('standalone')\r\n\r\n# repeated evaluation of transfer learning model, vary fixed layers\r\nn_fixed = 3\r\nfor i in range(n_fixed):\r\n\tscores = eval_transfer_model(trainX, trainy, testX, testy, i, n_repeats)\r\n\tprint('Transfer (fixed=%d) %.3f (%.3f)' % (i, mean(scores), std(scores)))\r\n\tdists.append(scores)\r\n\tdist_labels.append('transfer f='+str(i))\r\n\r\n# box and whisker plot of score distributions\r\npyplot.boxplot(dists, labels=dist_labels)\r\npyplot.show()<\/pre>\n<p>Running the example first reports the mean and standard deviation of classification accuracy on the test dataset for each model.<\/p>\n<p>Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p>In this case, we can see that the standalone model achieved an accuracy of about 78% on Problem 2 with a large standard deviation of 10%. In contrast, we can see that the spread of all of the transfer learning models is much smaller, ranging from about 0.05% to 1.5%.<\/p>\n<p>This difference in the standard deviations of the test accuracy scores shows the stability that transfer learning can bring to the model, reducing the variance in the performance of the final model introduced via the stochastic learning algorithm.<\/p>\n<p>Comparing the mean test accuracy of the models, we can see that transfer learning that used the model as a weight initialization scheme (fixed=0) resulted in better performance than the standalone model with about 80% accuracy.<\/p>\n<p>Keeping all hidden layers fixed (fixed=2) and using them as a feature extraction scheme resulted in worse performance on average than the standalone model. It suggests that the approach is too restrictive in this case.<\/p>\n<p>Interestingly, we see best performance when the first hidden layer is kept fixed (fixed=1) and the second hidden layer is adapted to the problem with a test classification accuracy of about 81%. This suggests that in this case, the problem benefits from both the feature extraction and weight initialization properties of transfer learning.<\/p>\n<p>It may be interesting to see how results of this last approach compare to the same model where the weights of the second hidden layer (and perhaps the output layer) are re-initialized with random numbers. This comparison would demonstrate whether the feature extraction properties of transfer learning alone or both feature extraction and weight initialization properties are beneficial.<\/p>\n<pre class=\"crayon-plain-tag\">Standalone 0.787 (0.101)\r\nTransfer (fixed=0) 0.805 (0.004)\r\nTransfer (fixed=1) 0.817 (0.005)\r\nTransfer (fixed=2) 0.750 (0.014)<\/pre>\n<p>A figure is created showing four box and whisker plots. The box shows the middle 50% of each data distribution, the orange line shows the median, and the dots show outliers.<\/p>\n<p>The boxplot for the standalone model shows a number of outliers, indicating that on average, the model performs well, but there is a chance that it can perform very poorly.<\/p>\n<p>Conversely, we see that the behavior of the models with transfer learning are more stable, showing a tighter distribution in performance.<\/p>\n<div id=\"attachment_6978\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6978\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/11\/Box-and-Whisker-Plot-Comparing-Standalone-and-Transfer-Learning-Models-via-Test-Set-Accuracy-on-the-Blobs-Multiclass-Classification-Problem.png\" alt=\"Box and Whisker Plot Comparing Standalone and Transfer Learning Models via Test Set Accuracy on the Blobs Multiclass Classification Problem\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Box-and-Whisker-Plot-Comparing-Standalone-and-Transfer-Learning-Models-via-Test-Set-Accuracy-on-the-Blobs-Multiclass-Classification-Problem.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Box-and-Whisker-Plot-Comparing-Standalone-and-Transfer-Learning-Models-via-Test-Set-Accuracy-on-the-Blobs-Multiclass-Classification-Problem-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Box-and-Whisker-Plot-Comparing-Standalone-and-Transfer-Learning-Models-via-Test-Set-Accuracy-on-the-Blobs-Multiclass-Classification-Problem-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/Box-and-Whisker-Plot-Comparing-Standalone-and-Transfer-Learning-Models-via-Test-Set-Accuracy-on-the-Blobs-Multiclass-Classification-Problem-1024x768.png 1024w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p class=\"wp-caption-text\">Box and Whisker Plot Comparing Standalone and Transfer Learning Models via Test Set Accuracy on the Blobs Multiclass Classification Problem<\/p>\n<\/div>\n<h2>Extensions<\/h2>\n<p>This section lists some ideas for extending the tutorial that you may wish to explore.<\/p>\n<ul>\n<li><strong>Reverse Experiment<\/strong>. Train and save a model for Problem 2 and see if it can help when using it for transfer learning on Problem 1.<\/li>\n<li><strong>Add Hidden Layer<\/strong>. Update the example to keep both hidden layers fixed, but add a new hidden layer with randomly initialized weights after the fixed layers before the output layer and compare performance.<\/li>\n<li><strong>Randomly Initialize Layers<\/strong>. Update the example to randomly initialize the weights of the second hidden layer and the output layer and compare performance.<\/li>\n<\/ul>\n<p>If you explore any of these extensions, I\u2019d love to know.<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Posts<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/transfer-learning-for-deep-learning\/\">A Gentle Introduction to Transfer Learning for Deep Learning<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/develop-a-deep-learning-caption-generation-model-in-python\/\">How to Develop a Deep Learning Photo Caption Generator from Scratch<\/a><\/li>\n<\/ul>\n<h3>Papers<\/h3>\n<ul>\n<li><a href=\"http:\/\/proceedings.mlr.press\/v27\/bengio12a.html\">Deep Learning of Representations for Unsupervised and Transfer Learning<\/a>, 2011.<\/li>\n<li><a href=\"https:\/\/dl.acm.org\/citation.cfm?id=3104547\">Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach<\/a>, 2011.<\/li>\n<li><a href=\"http:\/\/papers.nips.cc\/paper\/1034-is-learning-the-n-th-thing-any-easier-than-learning-the-first.pdf\">Is Learning The n-th Thing Any Easier Than Learning The First?<\/a>, 1996.<\/li>\n<\/ul>\n<h3>Books<\/h3>\n<ul>\n<li>Section 5.2 Transfer Learning and Domain Adaptation, <a href=\"https:\/\/amzn.to\/2NJW3gE\">Deep Learning<\/a>, 2016.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Transfer_learning\">Transfer learning, Wikipedia<\/a><\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to use transfer learning to improve the performance deep learning neural networks in Python with Keras.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Transfer learning is a method for reusing a model trained on a related predictive modeling problem.<\/li>\n<li>Transfer learning can be used to accelerate the training of neural networks as either a weight initialization scheme or feature extraction method.<\/li>\n<li>How to use transfer learning to improve the performance of an MLP for a multiclass classification problem.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/how-to-improve-performance-with-transfer-learning-for-deep-learning-neural-networks\/\">How to Improve Performance With Transfer Learning for Deep Learning Neural Networks<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/how-to-improve-performance-with-transfer-learning-for-deep-learning-neural-networks\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee An interesting benefit of deep learning neural networks is that they can be reused on related problems. Transfer learning refers to a [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/02\/07\/how-to-improve-performance-with-transfer-learning-for-deep-learning-neural-networks\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":1693,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1692"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=1692"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1692\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/1693"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=1692"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=1692"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=1692"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}