{"id":4457,"date":"2021-03-04T18:00:14","date_gmt":"2021-03-04T18:00:14","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2021\/03\/04\/how-to-update-neural-network-models-with-more-data\/"},"modified":"2021-03-04T18:00:14","modified_gmt":"2021-03-04T18:00:14","slug":"how-to-update-neural-network-models-with-more-data","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2021\/03\/04\/how-to-update-neural-network-models-with-more-data\/","title":{"rendered":"How to Update Neural Network Models With More Data"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Deep learning neural network models used for predictive modeling may need to be updated.<\/p>\n<p>This may be because the data has changed since the model was developed and deployed, or it may be the case that additional labeled data has been made available since the model was developed and it is expected that the additional data will improve the performance of the model.<\/p>\n<p>It is important to experiment and evaluate with a range of different approaches when updating neural network models for new data, especially if model updating will be automated, such as on a periodic schedule.<\/p>\n<p>There are many ways to <strong>update neural network models<\/strong>, although the two main approaches involve either using the existing model as a starting point and retraining it, or leaving the existing model unchanged and combining the predictions from the existing model with a new model.<\/p>\n<p>In this tutorial, you will discover how to update deep learning neural network models in response to new data.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Neural network models may need to be updated when the underlying data changes or when new labeled data is made available.<\/li>\n<li>How to update trained neural network models with just new data or combinations of old and new data.<\/li>\n<li>How to create an ensemble of existing and new models trained on just new data or combinations of old and new data.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_12245\" style=\"width: 809px\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-12245\" loading=\"lazy\" class=\"size-full wp-image-12245\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/06\/How-to-Update-Neural-Network-Models-With-More-Data.jpg\" alt=\"How to Update Neural Network Models With More Data\" width=\"799\" height=\"533\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/06\/How-to-Update-Neural-Network-Models-With-More-Data.jpg 799w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/06\/How-to-Update-Neural-Network-Models-With-More-Data-300x200.jpg 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/06\/How-to-Update-Neural-Network-Models-With-More-Data-768x512.jpg 768w\" sizes=\"(max-width: 799px) 100vw, 799px\"><\/p>\n<p id=\"caption-attachment-12245\" class=\"wp-caption-text\">How to Update Neural Network Models With More Data<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/52450054@N04\/8454104835\/\">Judy Gallagher<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Updating Neural Network Models<\/li>\n<li>Retraining Update Strategies\n<ol>\n<li>Update Model on New Data Only<\/li>\n<li>Update Model on Old and New Data<\/li>\n<\/ol>\n<\/li>\n<li>Ensemble Update Strategies\n<ol>\n<li>Ensemble Model With Model on New Data Only<\/li>\n<li>Ensemble Model With Model on Old and New Data<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<h2>Updating Neural Network Models<\/h2>\n<p>Selecting and finalizing a deep learning neural network model for a predictive modeling project is just the beginning.<\/p>\n<p>You can then start using the model to make predictions on new data.<\/p>\n<p>One possible problem that you may encounter is that the nature of the prediction problem may change over time.<\/p>\n<p>You may notice this by the fact that the effectiveness of predictions may begin to decline over time. This may be because the assumptions made and captured in the model are changing or no longer hold.<\/p>\n<p>Generally, this is referred to as the problem of \u201c<em>concept drift<\/em>\u201d where the underlying probability distributions of variables and relationships between variables change over time, which can negatively impact the model built from the data.<\/p>\n<p>For more on concept drift, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/gentle-introduction-concept-drift-machine-learning\/\">A Gentle Introduction to Concept Drift in Machine Learning<\/a><\/li>\n<\/ul>\n<p>Concept drift may affect your model at different times and depends specifically on the prediction problem you are solving and the model chosen to address it.<\/p>\n<p>It can be helpful to monitor the performance of a model over time and use a clear drop in model performance as a trigger to make a change to your model, such as re-training it on new data.<\/p>\n<p>Alternately, you may know that data in your domain changes frequently enough that a change to the model is required periodically, such as weekly, monthly, or annually.<\/p>\n<p>Finally, you may operate your model for a while and accumulate additional data with known outcomes that you wish to use to update your model, with the hopes of improving predictive performance.<\/p>\n<p>Importantly, you have a lot of flexibility when it comes to responding to a change to the problem or the availability of new data.<\/p>\n<p>For example, you can take the trained neural network model and update the model weights using the new data. Or we might want to leave the existing model untouched and combine its predictions with a new model fit on the newly available data.<\/p>\n<p>These approaches might represent two general themes in updating neural network models in response to new data, they are:<\/p>\n<ul>\n<li>Retrain Update Strategies.<\/li>\n<li>Ensemble Update Strategies.<\/li>\n<\/ul>\n<p>Let\u2019s take a closer look at each in turn.<\/p>\n<h2>Retraining Update Strategies<\/h2>\n<p>A benefit of neural network models is that their weights can be updated at any time with continued training.<\/p>\n<p>When responding to changes in the underlying data or the availability of new data, there are a few different strategies to choose from when updating a neural network model, such as:<\/p>\n<ul>\n<li>Continue training the model on the new data only.<\/li>\n<li>Continue training the model on the old and new data.<\/li>\n<\/ul>\n<p>We might also imagine variations on the above strategies, such as using a sample of the new data or a sample of new and old data instead of all available data, as well as possible instance-based weightings on sampled data.<\/p>\n<p>We might also consider extensions of the model that freeze the layers of the existing model (e.g. so model weights cannot change during training), then add new layers with model weights that can change, grafting on extensions to the model to handle any change in the data. Perhaps this is a variation of the retraining and the ensemble approach in the next section, and we\u2019ll leave it for now.<\/p>\n<p>Nevertheless, these are the two main strategies to consider.<\/p>\n<p>Let\u2019s make these approaches concrete with a worked example.<\/p>\n<h3>Update Model on New Data Only<\/h3>\n<p>We can update the model on the new data only.<\/p>\n<p>One extreme version of this approach is to not use any new data and simply re-train the model on the old data. This might be the same as \u201c<em>do nothing<\/em>\u201d in response to the new data. At the other extreme, a model could be fit on the new data only, discarding the old data and old model.<\/p>\n<ul>\n<li>Ignore new data, do nothing.<\/li>\n<li>Update existing model on new data.<\/li>\n<li>Fit new model on new data, discard old model and data.<\/li>\n<\/ul>\n<p>We will focus on the middle ground in this example, but it might be interesting to test all three approaches on your problem and see what works best.<\/p>\n<p>First, we can define a synthetic binary classification dataset and split it into half, then use one portion as \u201c<em>old data<\/em>\u201d and another portion as \u201c<em>new data<\/em>.\u201d<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# record the number of input features in the data\r\nn_features = X.shape[1]\r\n# split into old and new data\r\nX_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)<\/pre>\n<p>We can then define a Multilayer Perceptron model (MLP) and fit it on the old data only.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define the model\r\nmodel = Sequential()\r\nmodel.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))\r\nmodel.add(Dense(10, kernel_initializer='he_normal', activation='relu'))\r\nmodel.add(Dense(1, activation='sigmoid'))\r\n# define the optimization algorithm\r\nopt = SGD(learning_rate=0.01, momentum=0.9)\r\n# compile the model\r\nmodel.compile(optimizer=opt, loss='binary_crossentropy')\r\n# fit the model on old data\r\nmodel.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)<\/pre>\n<p>We can then imagine saving the model and using it for some time.<\/p>\n<p>Time passes, and we wish to update it on new data that has become available.<\/p>\n<p>This would involve using a much smaller learning rate than normal so that we do not wash away the weights learned on the old data.<\/p>\n<p><strong>Note<\/strong>: you will need to discover a learning rate that is appropriate for your model and dataset that achieves better performance than simply fitting a new model from scratch.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# update model on new data only with a smaller learning rate\r\nopt = SGD(learning_rate=0.001, momentum=0.9)\r\n# compile the model\r\nmodel.compile(optimizer=opt, loss='binary_crossentropy')<\/pre>\n<p>We can then fit the model on the new data only with this smaller learning rate.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\nmodel.compile(optimizer=opt, loss='binary_crossentropy')\r\n# fit the model on new data\r\nmodel.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)<\/pre>\n<p>Tying this together, the complete example of updating a neural network model on new data only is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># update neural network with new data only\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import train_test_split\r\nfrom tensorflow.keras.models import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\nfrom tensorflow.keras.optimizers import SGD\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# record the number of input features in the data\r\nn_features = X.shape[1]\r\n# split into old and new data\r\nX_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)\r\n# define the model\r\nmodel = Sequential()\r\nmodel.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))\r\nmodel.add(Dense(10, kernel_initializer='he_normal', activation='relu'))\r\nmodel.add(Dense(1, activation='sigmoid'))\r\n# define the optimization algorithm\r\nopt = SGD(learning_rate=0.01, momentum=0.9)\r\n# compile the model\r\nmodel.compile(optimizer=opt, loss='binary_crossentropy')\r\n# fit the model on old data\r\nmodel.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)\r\n\r\n# save model...\r\n\r\n# load model...\r\n\r\n# update model on new data only with a smaller learning rate\r\nopt = SGD(learning_rate=0.001, momentum=0.9)\r\n# compile the model\r\nmodel.compile(optimizer=opt, loss='binary_crossentropy')\r\n# fit the model on new data\r\nmodel.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)<\/pre>\n<p>Next, let\u2019s look at updating the model on new and old data.<\/p>\n<h3>Update Model on Old and New Data<\/h3>\n<p>We can update the model on a combination of both old and new data.<\/p>\n<p>An extreme version of this approach is to discard the model and simply fit a new model on all available data, new and old. A less extreme version would be to use the existing model as a starting point and update it based on the combined dataset.<\/p>\n<p>Again, it is a good idea to test both strategies and see what works well for your dataset.<\/p>\n<p>We will focus on the less extreme update strategy in this case.<\/p>\n<p>The synthetic dataset and model can be fit on the old dataset as before.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# record the number of input features in the data\r\nn_features = X.shape[1]\r\n# split into old and new data\r\nX_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)\r\n# define the model\r\nmodel = Sequential()\r\nmodel.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))\r\nmodel.add(Dense(10, kernel_initializer='he_normal', activation='relu'))\r\nmodel.add(Dense(1, activation='sigmoid'))\r\n# define the optimization algorithm\r\nopt = SGD(learning_rate=0.01, momentum=0.9)\r\n# compile the model\r\nmodel.compile(optimizer=opt, loss='binary_crossentropy')\r\n# fit the model on old data\r\nmodel.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)<\/pre>\n<p>New data comes available and we wish to update the model on a combination of both old and new data.<\/p>\n<p>First, we must use a much smaller learning rate in an attempt to use the current weights as a starting point for the search.<\/p>\n<p><strong>Note<\/strong>: you will need to discover a learning rate that is appropriate for your model and dataset that achieves better performance than simply fitting a new model from scratch.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# update model with a smaller learning rate\r\nopt = SGD(learning_rate=0.001, momentum=0.9)\r\n# compile the model\r\nmodel.compile(optimizer=opt, loss='binary_crossentropy')<\/pre>\n<p>We can then create a composite dataset composed of old and new data.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# create a composite dataset of old and new data\r\nX_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))<\/pre>\n<p>Finally, we can update the model on this composite dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# fit the model on new data\r\nmodel.fit(X_both, y_both, epochs=100, batch_size=32, verbose=0)<\/pre>\n<p>Tying this together, the complete example of updating a neural network model on both old and new data is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># update neural network with both old and new data\r\nfrom numpy import vstack\r\nfrom numpy import hstack\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import train_test_split\r\nfrom tensorflow.keras.models import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\nfrom tensorflow.keras.optimizers import SGD\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# record the number of input features in the data\r\nn_features = X.shape[1]\r\n# split into old and new data\r\nX_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)\r\n# define the model\r\nmodel = Sequential()\r\nmodel.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))\r\nmodel.add(Dense(10, kernel_initializer='he_normal', activation='relu'))\r\nmodel.add(Dense(1, activation='sigmoid'))\r\n# define the optimization algorithm\r\nopt = SGD(learning_rate=0.01, momentum=0.9)\r\n# compile the model\r\nmodel.compile(optimizer=opt, loss='binary_crossentropy')\r\n# fit the model on old data\r\nmodel.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)\r\n\r\n# save model...\r\n\r\n# load model...\r\n\r\n# update model with a smaller learning rate\r\nopt = SGD(learning_rate=0.001, momentum=0.9)\r\n# compile the model\r\nmodel.compile(optimizer=opt, loss='binary_crossentropy')\r\n# create a composite dataset of old and new data\r\nX_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))\r\n# fit the model on new data\r\nmodel.fit(X_both, y_both, epochs=100, batch_size=32, verbose=0)<\/pre>\n<p>Next, let\u2019s look at how to use ensemble models to respond to new data.<\/p>\n<h2>Ensemble Update Strategies<\/h2>\n<p>An ensemble is a predictive model that is composed of multiple other models.<\/p>\n<p>There are many different types of ensemble models, although perhaps the simplest approach is to average the predictions from multiple different models.<\/p>\n<p>For more on ensemble algorithms for deep learning neural networks, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/ensemble-methods-for-deep-learning-neural-networks\/\">Ensemble Learning Methods for Deep Learning Neural Networks<\/a><\/li>\n<\/ul>\n<p>We can use an ensemble model as a strategy when responding to changes in the underlying data or availability of new data.<\/p>\n<p>Mirroring the approaches in the previous section, we might consider two approaches to ensemble learning algorithms as strategies for responding to new data; they are:<\/p>\n<ul>\n<li>Ensemble of existing model and new model fit on new data only.<\/li>\n<li>Ensemble of existing model and new model fit on old and new data.<\/li>\n<\/ul>\n<p>Again, we might consider variations on these approaches, such as samples of old and new data, and more than one existing or additional models included in the ensemble.<\/p>\n<p>Nevertheless, these are the two main strategies to consider.<\/p>\n<p>Let\u2019s make these approaches concrete with a worked example.<\/p>\n<h3>Ensemble Model With Model on New Data Only<\/h3>\n<p>We can create an ensemble of the existing model and a new model fit on only the new data.<\/p>\n<p>The expectation is that the ensemble predictions perform better or are more stable (lower variance) than using either the old model or the new model alone. This should be checked on your dataset before adopting the ensemble.<\/p>\n<p>First, we can prepare the dataset and fit the old model, as we did in the previous sections.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# record the number of input features in the data\r\nn_features = X.shape[1]\r\n# split into old and new data\r\nX_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)\r\n# define the old model\r\nold_model = Sequential()\r\nold_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))\r\nold_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))\r\nold_model.add(Dense(1, activation='sigmoid'))\r\n# define the optimization algorithm\r\nopt = SGD(learning_rate=0.01, momentum=0.9)\r\n# compile the model\r\nold_model.compile(optimizer=opt, loss='binary_crossentropy')\r\n# fit the model on old data\r\nold_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)<\/pre>\n<p>Some time passes and new data becomes available.<\/p>\n<p>We can then fit a new model on the new data, naturally discovering a model and configuration that works well or best on the new dataset only.<\/p>\n<p>In this case, we\u2019ll simply use the same model architecture and configuration as the old model.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define the new model\r\nnew_model = Sequential()\r\nnew_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))\r\nnew_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))\r\nnew_model.add(Dense(1, activation='sigmoid'))\r\n# define the optimization algorithm\r\nopt = SGD(learning_rate=0.01, momentum=0.9)\r\n# compile the model\r\nnew_model.compile(optimizer=opt, loss='binary_crossentropy')<\/pre>\n<p>We can then fit this new model on the new data only.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# fit the model on old data\r\nnew_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)<\/pre>\n<p>Now that we have the two models, we can make predictions with each model, and calculate the average of the predictions as the \u201c<em>ensemble prediction<\/em>.\u201d<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# make predictions with both models\r\nyhat1 = old_model.predict(X_new)\r\nyhat2 = new_model.predict(X_new)\r\n# combine predictions into single array\r\ncombined = hstack((yhat1, yhat2))\r\n# calculate outcome as mean of predictions\r\nyhat = mean(combined, axis=-1)<\/pre>\n<p>Tying this together, the complete example of updating using an ensemble of the existing model and a new model fit on new data only is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># ensemble old neural network with new model fit on new data only\r\nfrom numpy import hstack\r\nfrom numpy import mean\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import train_test_split\r\nfrom tensorflow.keras.models import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\nfrom tensorflow.keras.optimizers import SGD\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# record the number of input features in the data\r\nn_features = X.shape[1]\r\n# split into old and new data\r\nX_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)\r\n# define the old model\r\nold_model = Sequential()\r\nold_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))\r\nold_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))\r\nold_model.add(Dense(1, activation='sigmoid'))\r\n# define the optimization algorithm\r\nopt = SGD(learning_rate=0.01, momentum=0.9)\r\n# compile the model\r\nold_model.compile(optimizer=opt, loss='binary_crossentropy')\r\n# fit the model on old data\r\nold_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)\r\n\r\n# save model...\r\n\r\n# load model...\r\n\r\n# define the new model\r\nnew_model = Sequential()\r\nnew_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))\r\nnew_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))\r\nnew_model.add(Dense(1, activation='sigmoid'))\r\n# define the optimization algorithm\r\nopt = SGD(learning_rate=0.01, momentum=0.9)\r\n# compile the model\r\nnew_model.compile(optimizer=opt, loss='binary_crossentropy')\r\n# fit the model on old data\r\nnew_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)\r\n\r\n# make predictions with both models\r\nyhat1 = old_model.predict(X_new)\r\nyhat2 = new_model.predict(X_new)\r\n# combine predictions into single array\r\ncombined = hstack((yhat1, yhat2))\r\n# calculate outcome as mean of predictions\r\nyhat = mean(combined, axis=-1)<\/pre>\n<\/p>\n<h3>Ensemble Model With Model on Old and New Data<\/h3>\n<p>We can create an ensemble of the existing model and a new model fit on both the old and the new data.<\/p>\n<p>The expectation is that the ensemble predictions perform better or are more stable (lower variance) than using either the old model or the new model alone. This should be checked on your dataset before adopting the ensemble.<\/p>\n<p>First, we can prepare the dataset and fit the old model, as we did in the previous sections.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# record the number of input features in the data\r\nn_features = X.shape[1]\r\n# split into old and new data\r\nX_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)\r\n# define the old model\r\nold_model = Sequential()\r\nold_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))\r\nold_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))\r\nold_model.add(Dense(1, activation='sigmoid'))\r\n# define the optimization algorithm\r\nopt = SGD(learning_rate=0.01, momentum=0.9)\r\n# compile the model\r\nold_model.compile(optimizer=opt, loss='binary_crossentropy')\r\n# fit the model on old data\r\nold_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)<\/pre>\n<p>Some time passes and new data becomes available.<\/p>\n<p>We can then fit a new model on a composite of the old and new data, naturally discovering a model and configuration that works well or best on the new dataset only.<\/p>\n<p>In this case, we\u2019ll simply use the same model architecture and configuration as the old model.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define the new model\r\nnew_model = Sequential()\r\nnew_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))\r\nnew_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))\r\nnew_model.add(Dense(1, activation='sigmoid'))\r\n# define the optimization algorithm\r\nopt = SGD(learning_rate=0.01, momentum=0.9)\r\n# compile the model\r\nnew_model.compile(optimizer=opt, loss='binary_crossentropy')<\/pre>\n<p>We can create a composite dataset from the old and new data, then fit the new model on this dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# create a composite dataset of old and new data\r\nX_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))\r\n# fit the model on old data\r\nnew_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)<\/pre>\n<p>Finally, we can use both models together to make ensemble predictions.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# make predictions with both models\r\nyhat1 = old_model.predict(X_new)\r\nyhat2 = new_model.predict(X_new)\r\n# combine predictions into single array\r\ncombined = hstack((yhat1, yhat2))\r\n# calculate outcome as mean of predictions\r\nyhat = mean(combined, axis=-1)<\/pre>\n<p>Tying this together, the complete example of updating using an ensemble of the existing model and a new model fit on the old and new data is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># ensemble old neural network with new model fit on old and new data\r\nfrom numpy import hstack\r\nfrom numpy import vstack\r\nfrom numpy import mean\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import train_test_split\r\nfrom tensorflow.keras.models import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\nfrom tensorflow.keras.optimizers import SGD\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# record the number of input features in the data\r\nn_features = X.shape[1]\r\n# split into old and new data\r\nX_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)\r\n# define the old model\r\nold_model = Sequential()\r\nold_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))\r\nold_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))\r\nold_model.add(Dense(1, activation='sigmoid'))\r\n# define the optimization algorithm\r\nopt = SGD(learning_rate=0.01, momentum=0.9)\r\n# compile the model\r\nold_model.compile(optimizer=opt, loss='binary_crossentropy')\r\n# fit the model on old data\r\nold_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)\r\n\r\n# save model...\r\n\r\n# load model...\r\n\r\n# define the new model\r\nnew_model = Sequential()\r\nnew_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))\r\nnew_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))\r\nnew_model.add(Dense(1, activation='sigmoid'))\r\n# define the optimization algorithm\r\nopt = SGD(learning_rate=0.01, momentum=0.9)\r\n# compile the model\r\nnew_model.compile(optimizer=opt, loss='binary_crossentropy')\r\n# create a composite dataset of old and new data\r\nX_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))\r\n# fit the model on old data\r\nnew_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)\r\n\r\n# make predictions with both models\r\nyhat1 = old_model.predict(X_new)\r\nyhat2 = new_model.predict(X_new)\r\n# combine predictions into single array\r\ncombined = hstack((yhat1, yhat2))\r\n# calculate outcome as mean of predictions\r\nyhat = mean(combined, axis=-1)<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Tutorials<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/gentle-introduction-concept-drift-machine-learning\/\">A Gentle Introduction to Concept Drift in Machine Learning<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/ensemble-methods-for-deep-learning-neural-networks\/\">Ensemble Learning Methods for Deep Learning Neural Networks<\/a><\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to update deep learning neural network models in response to new data.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Neural network models may need to be updated when the underlying data changes or when new labeled data is made available.<\/li>\n<li>How to update trained neural network models with just new data or combinations of old and new data.<\/li>\n<li>How to create an ensemble of existing and new models trained on just new data or combinations of old and new data.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/update-neural-network-models-with-more-data\/\">How to Update Neural Network Models With More Data<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/update-neural-network-models-with-more-data\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Deep learning neural network models used for predictive modeling may need to be updated. This may be because the data has changed [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2021\/03\/04\/how-to-update-neural-network-models-with-more-data\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":4458,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4457"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=4457"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4457\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/4458"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=4457"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=4457"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=4457"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}