{"id":1289,"date":"2018-11-13T18:00:06","date_gmt":"2018-11-13T18:00:06","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2018\/11\/13\/how-to-develop-lstm-models-for-time-series-forecasting\/"},"modified":"2018-11-13T18:00:06","modified_gmt":"2018-11-13T18:00:06","slug":"how-to-develop-lstm-models-for-time-series-forecasting","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2018\/11\/13\/how-to-develop-lstm-models-for-time-series-forecasting\/","title":{"rendered":"How to Develop LSTM Models for Time Series Forecasting"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Long Short-Term Memory networks, or LSTMs for short, can be applied to time series forecasting.<\/p>\n<p>There are many types of LSTM models that can be used for each specific type of time series forecasting problem.<\/p>\n<p>In this tutorial, you will discover how to develop a suite of LSTM models for a range of standard time series forecasting problems.<\/p>\n<p>The objective of this tutorial is to provide standalone examples of each model on each type of time series problem as a template that you can copy and adapt for your specific time series forecasting problem.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>How to develop LSTM models for univariate time series forecasting.<\/li>\n<li>How to develop LSTM models for multivariate time series forecasting.<\/li>\n<li>How to develop LSTM models for multi-step time series forecasting.<\/li>\n<\/ul>\n<p>This is a large and important post; you may want to bookmark it for future reference.<\/p>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_6436\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6436\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/11\/How-to-Develop-LSTM-Models-for-Time-Series-Forecasting.jpg\" alt=\"How to Develop LSTM Models for Time Series Forecasting\" width=\"640\" height=\"360\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/How-to-Develop-LSTM-Models-for-Time-Series-Forecasting.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/11\/How-to-Develop-LSTM-Models-for-Time-Series-Forecasting-300x169.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p class=\"wp-caption-text\">How to Develop LSTM Models for Time Series Forecasting<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/15216811@N06\/6704346543\/\">N i c o l a<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>In this tutorial, we will explore how to develop a suite of different types of LSTM models for time series forecasting.<\/p>\n<p>The models are demonstrated on small contrived time series problems intended to give the flavor of the type of time series problem being addressed. The chosen configuration of the models is arbitrary and not optimized for each problem; that was not the goal.<\/p>\n<p>This tutorial is divided into four parts; they are:<\/p>\n<ol>\n<li>Univariate LSTM Models<\/li>\n<li>Multivariate LSTM Models<\/li>\n<li>Multi-Step LSTM Models<\/li>\n<li>Multivariate Multi-Step LSTM Models<\/li>\n<\/ol>\n<h2>Univariate LSTM Models<\/h2>\n<p>LSTMs can be used to model univariate time series forecasting problems.<\/p>\n<p>These are problems comprised of a single series of observations and a model is required to learn from the series of past observations to predict the next value in the sequence.<\/p>\n<p>We will demonstrate a number of variations of the LSTM model for univariate time series forecasting.<\/p>\n<p>This section is divided into six parts; they are:<\/p>\n<ol>\n<li>Data Preparation<\/li>\n<li>Vanilla LSTM<\/li>\n<li>Stacked LSTM<\/li>\n<li>Bidirectional LSTM<\/li>\n<li>CNN LSTM<\/li>\n<li>ConvLSTM<\/li>\n<\/ol>\n<p>Each of these models are demonstrated for one-step univariate time series forecasting, but can easily be adapted and used as the input part of a model for other types of time series forecasting problems.<\/p>\n<h3>Data Preparation<\/h3>\n<p>Before a univariate series can be modeled, it must be prepared.<\/p>\n<p>The LSTM model will learn a function that maps a sequence of past observations as input to an output observation. As such, the sequence of observations must be transformed into multiple examples from which the LSTM can learn.<\/p>\n<p>Consider a given univariate sequence:<\/p>\n<pre class=\"crayon-plain-tag\">[10, 20, 30, 40, 50, 60, 70, 80, 90]<\/pre>\n<p>We can divide the sequence into multiple input\/output patterns called samples, where three time steps are used as input and one time step is used as output for the one-step prediction that is being learned.<\/p>\n<pre class=\"crayon-plain-tag\">X,\t\t\t\ty\r\n10, 20, 30\t\t40\r\n20, 30, 40\t\t50\r\n30, 40, 50\t\t60\r\n...<\/pre>\n<p>The <em>split_sequence()<\/em> function below implements this behavior and will split a given univariate sequence into multiple samples where each sample has a specified number of time steps and the output is a single time step.<\/p>\n<pre class=\"crayon-plain-tag\"># split a univariate sequence into samples\r\ndef split_sequence(sequence, n_steps):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequence)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps\r\n\t\t# check if we are beyond the sequence\r\n\t\tif end_ix > len(sequence)-1:\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequence[i:end_ix], sequence[end_ix]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)<\/pre>\n<p>We can demonstrate this function on our small contrived dataset above.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># univariate data preparation\r\nfrom numpy import array\r\n\r\n# split a univariate sequence into samples\r\ndef split_sequence(sequence, n_steps):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequence)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps\r\n\t\t# check if we are beyond the sequence\r\n\t\tif end_ix > len(sequence)-1:\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequence[i:end_ix], sequence[end_ix]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nraw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]\r\n# choose a number of time steps\r\nn_steps = 3\r\n# split into samples\r\nX, y = split_sequence(raw_seq, n_steps)\r\n# summarize the data\r\nfor i in range(len(X)):\r\n\tprint(X[i], y[i])<\/pre>\n<p>Running the example splits the univariate series into six samples where each sample has three input time steps and one output time step.<\/p>\n<pre class=\"crayon-plain-tag\">[10 20 30] 40\r\n[20 30 40] 50\r\n[30 40 50] 60\r\n[40 50 60] 70\r\n[50 60 70] 80\r\n[60 70 80] 90<\/pre>\n<p>Now that we know how to prepare a univariate series for modeling, let\u2019s look at developing LSTM models that can learn the mapping of inputs to outputs, starting with a Vanilla LSTM.<\/p>\n<\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<p><center><\/p>\n<h3>Need help with Deep Learning for Time Series?<\/h3>\n<p>Take my free 7-day email crash course now (with sample code).<\/p>\n<p>Click to sign-up and also get a free PDF Ebook version of the course.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/14531ee73f72a2%3A164f8be4f346dc\/5630742793027584\/\" target=\"_blank\" style=\"background: rgb(255, 206, 10); color: rgb(255, 255, 255); text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-weight: bold; font-size: 16px; line-height: 20px; padding: 10px; display: inline-block; max-width: 300px; border-radius: 5px; text-shadow: rgba(0, 0, 0, 0.25) 0px -1px 1px; box-shadow: rgba(255, 255, 255, 0.5) 0px 1px 3px inset, rgba(0, 0, 0, 0.5) 0px 1px 3px;\">Download Your FREE Mini-Course<\/a><script data-leadbox=\"14531ee73f72a2:164f8be4f346dc\" data-url=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/14531ee73f72a2%3A164f8be4f346dc\/5630742793027584\/\" data-config=\"%7B%7D\" type=\"text\/javascript\" src=\"https:\/\/machinelearningmastery.lpages.co\/leadbox-1534880695.js\"><\/script><\/p>\n<p><\/center><\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<h3>Vanilla LSTM<\/h3>\n<p>A Vanilla LSTM is an LSTM model that has a single hidden layer of LSTM units, and an output layer used to make a prediction.<\/p>\n<p>We can define a Vanilla LSTM for univariate time series forecasting as follows.<\/p>\n<pre class=\"crayon-plain-tag\"># define model\r\nmodel = Sequential()\r\nmodel.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))\r\nmodel.add(Dense(1))\r\nmodel.compile(optimizer='adam', loss='mse')<\/pre>\n<p>Key in the definition is the shape of the input; that is what the model expects as input for each sample in terms of the number of time steps and the number of features.<\/p>\n<p>We are working with a univariate series, so the number of features is one, for one variable.<\/p>\n<p>The number of time steps as input is the number we chose when preparing our dataset as an argument to the <em>split_sequence()<\/em> function.<\/p>\n<p>The shape of the input for each sample is specified in the <em>input_shape<\/em> argument on the definition of first hidden layer.<\/p>\n<p>We almost always have multiple samples, therefore, the model will expect the input component of training data to have the dimensions or shape:<\/p>\n<pre class=\"crayon-plain-tag\">[samples, timesteps, features]<\/pre>\n<p>Our <em>split_sequence()<\/em> function in the previous section outputs the X with the shape [<em>samples, timesteps<\/em>], so we easily reshape it to have an additional dimension for the one feature.<\/p>\n<pre class=\"crayon-plain-tag\"># reshape from [samples, timesteps] into [samples, timesteps, features]\r\nn_features = 1\r\nX = X.reshape((X.shape[0], X.shape[1], n_features))<\/pre>\n<p>In this case, we define a model with 50 LSTM units in the hidden layer and an output layer that predicts a single numerical value.<\/p>\n<p>The model is fit using the efficient <a href=\"https:\/\/machinelearningmastery.com\/adam-optimization-algorithm-for-deep-learning\/\">Adam version of stochastic gradient descent<\/a> and optimized using the mean squared error, or \u2018<em>mse<\/em>\u2018 loss function.<\/p>\n<p>Once the model is defined, we can fit it on the training dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># fit model\r\nmodel.fit(X, y, epochs=200, verbose=0)<\/pre>\n<p>After the model is fit, we can use it to make a prediction.<\/p>\n<p>We can predict the next value in the sequence by providing the input:<\/p>\n<pre class=\"crayon-plain-tag\">[70, 80, 90]<\/pre>\n<p>And expecting the model to predict something like:<\/p>\n<pre class=\"crayon-plain-tag\">[100]<\/pre>\n<p>The model expects the input shape to be three-dimensional with [<em>samples, timesteps, features<\/em>], therefore, we must reshape the single input sample before making the prediction.<\/p>\n<pre class=\"crayon-plain-tag\"># demonstrate prediction\r\nx_input = array([70, 80, 90])\r\nx_input = x_input.reshape((1, n_steps, n_features))\r\nyhat = model.predict(x_input, verbose=0)<\/pre>\n<p>We can tie all of this together and demonstrate how to develop a Vanilla LSTM for univariate time series forecasting and make a single prediction.<\/p>\n<pre class=\"crayon-plain-tag\"># univariate lstm example\r\nfrom numpy import array\r\nfrom keras.models import Sequential\r\nfrom keras.layers import LSTM\r\nfrom keras.layers import Dense\r\n\r\n# split a univariate sequence into samples\r\ndef split_sequence(sequence, n_steps):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequence)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps\r\n\t\t# check if we are beyond the sequence\r\n\t\tif end_ix > len(sequence)-1:\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequence[i:end_ix], sequence[end_ix]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nraw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]\r\n# choose a number of time steps\r\nn_steps = 3\r\n# split into samples\r\nX, y = split_sequence(raw_seq, n_steps)\r\n# reshape from [samples, timesteps] into [samples, timesteps, features]\r\nn_features = 1\r\nX = X.reshape((X.shape[0], X.shape[1], n_features))\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))\r\nmodel.add(Dense(1))\r\nmodel.compile(optimizer='adam', loss='mse')\r\n# fit model\r\nmodel.fit(X, y, epochs=200, verbose=0)\r\n# demonstrate prediction\r\nx_input = array([70, 80, 90])\r\nx_input = x_input.reshape((1, n_steps, n_features))\r\nyhat = model.predict(x_input, verbose=0)\r\nprint(yhat)<\/pre>\n<p>Running the example prepares the data, fits the model, and makes a prediction.<\/p>\n<p>Your results may vary given the stochastic nature of the algorithm; try running the example a few times.<\/p>\n<p>We can see that the model predicts the next value in the sequence.<\/p>\n<pre class=\"crayon-plain-tag\">[[102.09213]]<\/pre>\n<\/p>\n<h3>Stacked LSTM<\/h3>\n<p>Multiple hidden LSTM layers can be stacked one on top of another in what is referred to as a Stacked LSTM model.<\/p>\n<p>An LSTM layer requires a three-dimensional input and LSTMs by default will produce a two-dimensional output as an interpretation from the end of the sequence.<\/p>\n<p>We can address this by having the LSTM output a value for each time step in the input data by setting the <em>return_sequences=True<\/em> argument on the layer. This allows us to have 3D output from hidden LSTM layer as input to the next.<\/p>\n<p>We can therefore define a Stacked LSTM as follows.<\/p>\n<pre class=\"crayon-plain-tag\"># define model\r\nmodel = Sequential()\r\nmodel.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))\r\nmodel.add(LSTM(50, activation='relu'))\r\nmodel.add(Dense(1))\r\nmodel.compile(optimizer='adam', loss='mse')<\/pre>\n<p>We can tie this together; the complete code example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># univariate stacked lstm example\r\nfrom numpy import array\r\nfrom keras.models import Sequential\r\nfrom keras.layers import LSTM\r\nfrom keras.layers import Dense\r\n\r\n# split a univariate sequence\r\ndef split_sequence(sequence, n_steps):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequence)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps\r\n\t\t# check if we are beyond the sequence\r\n\t\tif end_ix > len(sequence)-1:\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequence[i:end_ix], sequence[end_ix]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nraw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]\r\n# choose a number of time steps\r\nn_steps = 3\r\n# split into samples\r\nX, y = split_sequence(raw_seq, n_steps)\r\n# reshape from [samples, timesteps] into [samples, timesteps, features]\r\nn_features = 1\r\nX = X.reshape((X.shape[0], X.shape[1], n_features))\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))\r\nmodel.add(LSTM(50, activation='relu'))\r\nmodel.add(Dense(1))\r\nmodel.compile(optimizer='adam', loss='mse')\r\n# fit model\r\nmodel.fit(X, y, epochs=200, verbose=0)\r\n# demonstrate prediction\r\nx_input = array([70, 80, 90])\r\nx_input = x_input.reshape((1, n_steps, n_features))\r\nyhat = model.predict(x_input, verbose=0)\r\nprint(yhat)<\/pre>\n<p>Running the example predicts the next value in the sequence, which we expect would be 100.<\/p>\n<pre class=\"crayon-plain-tag\">[[102.47341]]<\/pre>\n<\/p>\n<h3>Bidirectional LSTM<\/h3>\n<p>On some sequence prediction problems, it can be beneficial to allow the LSTM model to learn the input sequence both forward and backwards and concatenate both interpretations.<\/p>\n<p>This is called a <a href=\"https:\/\/machinelearningmastery.com\/develop-bidirectional-lstm-sequence-classification-python-keras\/\">Bidirectional LSTM<\/a>.<\/p>\n<p>We can implement a Bidirectional LSTM for univariate time series forecasting by wrapping the first hidden layer in a wrapper layer called Bidirectional.<\/p>\n<p>An example of defining a Bidirectional LSTM to read input both forward and backward is as follows.<\/p>\n<pre class=\"crayon-plain-tag\"># define model\r\nmodel = Sequential()\r\nmodel.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(n_steps, n_features)))\r\nmodel.add(Dense(1))\r\nmodel.compile(optimizer='adam', loss='mse')<\/pre>\n<p>The complete example of the Bidirectional LSTM for univariate time series forecasting is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># univariate bidirectional lstm example\r\nfrom numpy import array\r\nfrom keras.models import Sequential\r\nfrom keras.layers import LSTM\r\nfrom keras.layers import Dense\r\nfrom keras.layers import Bidirectional\r\n\r\n# split a univariate sequence\r\ndef split_sequence(sequence, n_steps):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequence)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps\r\n\t\t# check if we are beyond the sequence\r\n\t\tif end_ix > len(sequence)-1:\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequence[i:end_ix], sequence[end_ix]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nraw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]\r\n# choose a number of time steps\r\nn_steps = 3\r\n# split into samples\r\nX, y = split_sequence(raw_seq, n_steps)\r\n# reshape from [samples, timesteps] into [samples, timesteps, features]\r\nn_features = 1\r\nX = X.reshape((X.shape[0], X.shape[1], n_features))\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(n_steps, n_features)))\r\nmodel.add(Dense(1))\r\nmodel.compile(optimizer='adam', loss='mse')\r\n# fit model\r\nmodel.fit(X, y, epochs=200, verbose=0)\r\n# demonstrate prediction\r\nx_input = array([70, 80, 90])\r\nx_input = x_input.reshape((1, n_steps, n_features))\r\nyhat = model.predict(x_input, verbose=0)\r\nprint(yhat)<\/pre>\n<p>Running the example predicts the next value in the sequence, which we expect would be 100.<\/p>\n<pre class=\"crayon-plain-tag\">[[101.48093]]<\/pre>\n<\/p>\n<h3>CNN LSTM<\/h3>\n<p>A convolutional neural network, or CNN for short, is a type of neural network developed for working with two-dimensional image data.<\/p>\n<p>The CNN can be very effective at automatically extracting and learning features from one-dimensional sequence data such as univariate time series data.<\/p>\n<p>A CNN model can be used in a hybrid model with an LSTM backend where the CNN is used to interpret subsequences of input that together are provided as a sequence to an LSTM model to interpret. <a href=\"https:\/\/machinelearningmastery.com\/cnn-long-short-term-memory-networks\/\">This hybrid model is called a CNN-LSTM<\/a>.<\/p>\n<p>The first step is to split the input sequences into subsequences that can be processed by the CNN model. For example, we can first split our univariate time series data into input\/output samples with four steps as input and one as output. Each sample can then be split into two sub-samples, each with two time steps. The CNN can interpret each subsequence of two time steps and provide a time series of interpretations of the subsequences to the LSTM model to process as input.<\/p>\n<p>We can parameterize this and define the number of subsequences as <em>n_seq<\/em> and the number of time steps per subsequence as <em>n_steps<\/em>. The input data can then be reshaped to have the required structure:<\/p>\n<pre class=\"crayon-plain-tag\">[samples, subsequences, timesteps, features]<\/pre>\n<p>For example:<\/p>\n<pre class=\"crayon-plain-tag\"># choose a number of time steps\r\nn_steps = 4\r\n# split into samples\r\nX, y = split_sequence(raw_seq, n_steps)\r\n# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]\r\nn_features = 1\r\nn_seq = 2\r\nn_steps = 2\r\nX = X.reshape((X.shape[0], n_seq, n_steps, n_features))<\/pre>\n<p>We want to reuse the same CNN model when reading in each sub-sequence of data separately.<\/p>\n<p>This can be achieved by wrapping the entire CNN model in a <a href=\"https:\/\/machinelearningmastery.com\/timedistributed-layer-for-long-short-term-memory-networks-in-python\/\">TimeDistributed wrapper<\/a> that will apply the entire model once per input, in this case, once per input subsequence.<\/p>\n<p>The CNN model first has a convolutional layer for reading across the subsequence that requires a number of filters and a kernel size to be specified. The number of filters is the number of reads or interpretations of the input sequence. The kernel size is the number of time steps included of each \u2018read\u2019 operation of the input sequence.<\/p>\n<p>The convolution layer is followed by a max pooling layer that distills the filter maps down to 1\/4 of their size that includes the most salient features. These structures are then flattened down to a single one-dimensional vector to be used as a single input time step to the LSTM layer.<\/p>\n<pre class=\"crayon-plain-tag\">model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, n_steps, n_features)))\r\nmodel.add(TimeDistributed(MaxPooling1D(pool_size=2)))\r\nmodel.add(TimeDistributed(Flatten()))<\/pre>\n<p>Next, we can define the LSTM part of the model that interprets the CNN model\u2019s read of the input sequence and makes a prediction.<\/p>\n<pre class=\"crayon-plain-tag\">model.add(LSTM(50, activation='relu'))\r\nmodel.add(Dense(1))<\/pre>\n<p>We can tie all of this together; the complete example of a CNN-LSTM model for univariate time series forecasting is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># univariate cnn lstm example\r\nfrom numpy import array\r\nfrom keras.models import Sequential\r\nfrom keras.layers import LSTM\r\nfrom keras.layers import Dense\r\nfrom keras.layers import Flatten\r\nfrom keras.layers import TimeDistributed\r\nfrom keras.layers.convolutional import Conv1D\r\nfrom keras.layers.convolutional import MaxPooling1D\r\n\r\n# split a univariate sequence into samples\r\ndef split_sequence(sequence, n_steps):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequence)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps\r\n\t\t# check if we are beyond the sequence\r\n\t\tif end_ix > len(sequence)-1:\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequence[i:end_ix], sequence[end_ix]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nraw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]\r\n# choose a number of time steps\r\nn_steps = 4\r\n# split into samples\r\nX, y = split_sequence(raw_seq, n_steps)\r\n# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]\r\nn_features = 1\r\nn_seq = 2\r\nn_steps = 2\r\nX = X.reshape((X.shape[0], n_seq, n_steps, n_features))\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, n_steps, n_features)))\r\nmodel.add(TimeDistributed(MaxPooling1D(pool_size=2)))\r\nmodel.add(TimeDistributed(Flatten()))\r\nmodel.add(LSTM(50, activation='relu'))\r\nmodel.add(Dense(1))\r\nmodel.compile(optimizer='adam', loss='mse')\r\n# fit model\r\nmodel.fit(X, y, epochs=500, verbose=0)\r\n# demonstrate prediction\r\nx_input = array([60, 70, 80, 90])\r\nx_input = x_input.reshape((1, n_seq, n_steps, n_features))\r\nyhat = model.predict(x_input, verbose=0)\r\nprint(yhat)<\/pre>\n<p>Running the example predicts the next value in the sequence, which we expect would be 100.<\/p>\n<pre class=\"crayon-plain-tag\">[[101.69263]]<\/pre>\n<\/p>\n<h3>ConvLSTM<\/h3>\n<p>A type of LSTM related to the CNN-LSTM is the ConvLSTM, where the convolutional reading of input is built directly into each LSTM unit.<\/p>\n<p>The ConvLSTM was developed for reading two-dimensional spatial-temporal data, but can be adapted for use with univariate time series forecasting.<\/p>\n<p>The layer expects input as a sequence of two-dimensional images, therefore the shape of input data must be:<\/p>\n<pre class=\"crayon-plain-tag\">[samples, timesteps, rows, columns, features]<\/pre>\n<p>For our purposes, we can split each sample into subsequences where timesteps will become the number of subsequences, or <em>n_seq<\/em>, and columns will be the number of time steps for each subsequence, or <em>n_steps<\/em>. The number of rows is fixed at 1 as we are working with one-dimensional data.<\/p>\n<p>We can now reshape the prepared samples into the required structure.<\/p>\n<pre class=\"crayon-plain-tag\"># choose a number of time steps\r\nn_steps = 4\r\n# split into samples\r\nX, y = split_sequence(raw_seq, n_steps)\r\n# reshape from [samples, timesteps] into [samples, timesteps, rows, columns, features]\r\nn_features = 1\r\nn_seq = 2\r\nn_steps = 2\r\nX = X.reshape((X.shape[0], n_seq, 1, n_steps, n_features))<\/pre>\n<p>We can define the ConvLSTM as a single layer in terms of the number of filters and a two-dimensional kernel size in terms of (rows, columns). As we are working with a one-dimensional series, the number of rows is always fixed to 1 in the kernel.<\/p>\n<p>The output of the model must then be flattened before it can be interpreted and a prediction made.<\/p>\n<pre class=\"crayon-plain-tag\">model.add(ConvLSTM2D(filters=64, kernel_size=(1,2), activation='relu', input_shape=(n_seq, 1, n_steps, n_features)))\r\nmodel.add(Flatten())<\/pre>\n<p>The complete example of a ConvLSTM for one-step univariate time series forecasting is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># univariate convlstm example\r\nfrom numpy import array\r\nfrom keras.models import Sequential\r\nfrom keras.layers import LSTM\r\nfrom keras.layers import Dense\r\nfrom keras.layers import Flatten\r\nfrom keras.layers import ConvLSTM2D\r\n\r\n# split a univariate sequence into samples\r\ndef split_sequence(sequence, n_steps):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequence)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps\r\n\t\t# check if we are beyond the sequence\r\n\t\tif end_ix > len(sequence)-1:\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequence[i:end_ix], sequence[end_ix]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nraw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]\r\n# choose a number of time steps\r\nn_steps = 4\r\n# split into samples\r\nX, y = split_sequence(raw_seq, n_steps)\r\n# reshape from [samples, timesteps] into [samples, timesteps, rows, columns, features]\r\nn_features = 1\r\nn_seq = 2\r\nn_steps = 2\r\nX = X.reshape((X.shape[0], n_seq, 1, n_steps, n_features))\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(ConvLSTM2D(filters=64, kernel_size=(1,2), activation='relu', input_shape=(n_seq, 1, n_steps, n_features)))\r\nmodel.add(Flatten())\r\nmodel.add(Dense(1))\r\nmodel.compile(optimizer='adam', loss='mse')\r\n# fit model\r\nmodel.fit(X, y, epochs=500, verbose=0)\r\n# demonstrate prediction\r\nx_input = array([60, 70, 80, 90])\r\nx_input = x_input.reshape((1, n_seq, 1, n_steps, n_features))\r\nyhat = model.predict(x_input, verbose=0)\r\nprint(yhat)<\/pre>\n<p>Running the example predicts the next value in the sequence, which we expect would be 100.<\/p>\n<pre class=\"crayon-plain-tag\">[[103.68166]]<\/pre>\n<p>Now that we have looked at LSTM models for univariate data, let\u2019s turn our attention to multivariate data.<\/p>\n<h2>Multivariate LSTM Models<\/h2>\n<p>Multivariate time series data means data where there is more than one observation for each time step.<\/p>\n<p>There are two main models that we may require with multivariate time series data; they are:<\/p>\n<ol>\n<li>Multiple Input Series.<\/li>\n<li>Multiple Parallel Series.<\/li>\n<\/ol>\n<p>Let\u2019s take a look at each in turn.<\/p>\n<h3>Multiple Input Series<\/h3>\n<p>A problem may have two or more parallel input time series and an output time series that is dependent on the input time series.<\/p>\n<p>The input time series are parallel because each series has an observation at the same time steps.<\/p>\n<p>We can demonstrate this with a simple example of two parallel input time series where the output series is the simple addition of the input series.<\/p>\n<pre class=\"crayon-plain-tag\"># define input sequence\r\nin_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])\r\nin_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])\r\nout_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])<\/pre>\n<p>We can reshape these three arrays of data as a single dataset where each row is a time step, and each column is a separate time series. This is a standard way of storing parallel time series in a CSV file.<\/p>\n<pre class=\"crayon-plain-tag\"># convert to [rows, columns] structure\r\nin_seq1 = in_seq1.reshape((len(in_seq1), 1))\r\nin_seq2 = in_seq2.reshape((len(in_seq2), 1))\r\nout_seq = out_seq.reshape((len(out_seq), 1))\r\n# horizontally stack columns\r\ndataset = hstack((in_seq1, in_seq2, out_seq))<\/pre>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># multivariate data preparation\r\nfrom numpy import array\r\nfrom numpy import hstack\r\n# define input sequence\r\nin_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])\r\nin_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])\r\nout_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])\r\n# convert to [rows, columns] structure\r\nin_seq1 = in_seq1.reshape((len(in_seq1), 1))\r\nin_seq2 = in_seq2.reshape((len(in_seq2), 1))\r\nout_seq = out_seq.reshape((len(out_seq), 1))\r\n# horizontally stack columns\r\ndataset = hstack((in_seq1, in_seq2, out_seq))\r\nprint(dataset)<\/pre>\n<p>Running the example prints the dataset with one row per time step and one column for each of the two input and one output parallel time series.<\/p>\n<pre class=\"crayon-plain-tag\">[[ 10  15  25]\r\n [ 20  25  45]\r\n [ 30  35  65]\r\n [ 40  45  85]\r\n [ 50  55 105]\r\n [ 60  65 125]\r\n [ 70  75 145]\r\n [ 80  85 165]\r\n [ 90  95 185]]<\/pre>\n<p>As with the univariate time series, we must structure these data into samples with input and output elements.<\/p>\n<p>An LSTM model needs sufficient context to learn a mapping from an input sequence to an output value. LSTMs can support parallel input time series as separate variables or features. Therefore, we need to split the data into samples maintaining the order of observations across the two input sequences.<\/p>\n<p>If we chose three input time steps, then the first sample would look as follows:<\/p>\n<p>Input:<\/p>\n<pre class=\"crayon-plain-tag\">10, 15\r\n20, 25\r\n30, 35<\/pre>\n<p>Output:<\/p>\n<pre class=\"crayon-plain-tag\">65<\/pre>\n<p>That is, the first three time steps of each parallel series are provided as input to the model and the model associates this with the value in the output series at the third time step, in this case, 65.<\/p>\n<p>We can see that, in transforming the time series into input\/output samples to train the model, that we will have to discard some values from the output time series where we do not have values in the input time series at prior time steps. In turn, the choice of the size of the number of input time steps will have an important effect on how much of the training data is used.<\/p>\n<p>We can define a function named <em>split_sequences()<\/em> that will take a dataset as we have defined it with rows for time steps and columns for parallel series and return input\/output samples.<\/p>\n<pre class=\"crayon-plain-tag\"># split a multivariate sequence into samples\r\ndef split_sequences(sequences, n_steps):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequences)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps\r\n\t\t# check if we are beyond the dataset\r\n\t\tif end_ix > len(sequences):\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1, -1]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)<\/pre>\n<p>We can test this function on our dataset using three time steps for each input time series as input.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># multivariate data preparation\r\nfrom numpy import array\r\nfrom numpy import hstack\r\n\r\n# split a multivariate sequence into samples\r\ndef split_sequences(sequences, n_steps):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequences)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps\r\n\t\t# check if we are beyond the dataset\r\n\t\tif end_ix > len(sequences):\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1, -1]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nin_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])\r\nin_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])\r\nout_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])\r\n# convert to [rows, columns] structure\r\nin_seq1 = in_seq1.reshape((len(in_seq1), 1))\r\nin_seq2 = in_seq2.reshape((len(in_seq2), 1))\r\nout_seq = out_seq.reshape((len(out_seq), 1))\r\n# horizontally stack columns\r\ndataset = hstack((in_seq1, in_seq2, out_seq))\r\n# choose a number of time steps\r\nn_steps = 3\r\n# convert into input\/output\r\nX, y = split_sequences(dataset, n_steps)\r\nprint(X.shape, y.shape)\r\n# summarize the data\r\nfor i in range(len(X)):\r\n\tprint(X[i], y[i])<\/pre>\n<p>Running the example first prints the shape of the X and y components.<\/p>\n<p>We can see that the X component has a three-dimensional structure.<\/p>\n<p>The first dimension is the number of samples, in this case 7. The second dimension is the number of time steps per sample, in this case 3, the value specified to the function. Finally, the last dimension specifies the number of parallel time series or the number of variables, in this case 2 for the two parallel series.<\/p>\n<p>This is the exact three-dimensional structure expected by an LSTM as input. The data is ready to use without further reshaping.<\/p>\n<p>We can then see that the input and output for each sample is printed, showing the three time steps for each of the two input series and the associated output for each sample.<\/p>\n<pre class=\"crayon-plain-tag\">(7, 3, 2) (7,)\r\n\r\n[[10 15]\r\n [20 25]\r\n [30 35]] 65\r\n[[20 25]\r\n [30 35]\r\n [40 45]] 85\r\n[[30 35]\r\n [40 45]\r\n [50 55]] 105\r\n[[40 45]\r\n [50 55]\r\n [60 65]] 125\r\n[[50 55]\r\n [60 65]\r\n [70 75]] 145\r\n[[60 65]\r\n [70 75]\r\n [80 85]] 165\r\n[[70 75]\r\n [80 85]\r\n [90 95]] 185<\/pre>\n<p>We are now ready to fit an LSTM model on this data.<\/p>\n<p>Any of the varieties of LSTMs in the previous section can be used, such as a Vanilla, Stacked, Bidirectional, CNN, or ConvLSTM model.<\/p>\n<p>We will use a Vanilla LSTM where the number of time steps and parallel series (features) are specified for the input layer via the <em>input_shape<\/em> argument.<\/p>\n<pre class=\"crayon-plain-tag\"># define model\r\nmodel = Sequential()\r\nmodel.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))\r\nmodel.add(Dense(1))\r\nmodel.compile(optimizer='adam', loss='mse')<\/pre>\n<p>When making a prediction, the model expects three time steps for two input time series.<\/p>\n<p>We can predict the next value in the output series providing the input values of:<\/p>\n<pre class=\"crayon-plain-tag\">80,\t 85\r\n90,\t 95\r\n100, 105<\/pre>\n<p>The shape of the one sample with three time steps and two variables must be [1, 3, 2].<\/p>\n<p>We would expect the next value in the sequence to be 100 + 105, or 205.<\/p>\n<pre class=\"crayon-plain-tag\"># demonstrate prediction\r\nx_input = array([[80, 85], [90, 95], [100, 105]])\r\nx_input = x_input.reshape((1, n_steps, n_features))\r\nyhat = model.predict(x_input, verbose=0)<\/pre>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># multivariate lstm example\r\nfrom numpy import array\r\nfrom numpy import hstack\r\nfrom keras.models import Sequential\r\nfrom keras.layers import LSTM\r\nfrom keras.layers import Dense\r\n\r\n# split a multivariate sequence into samples\r\ndef split_sequences(sequences, n_steps):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequences)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps\r\n\t\t# check if we are beyond the dataset\r\n\t\tif end_ix > len(sequences):\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1, -1]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nin_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])\r\nin_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])\r\nout_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])\r\n# convert to [rows, columns] structure\r\nin_seq1 = in_seq1.reshape((len(in_seq1), 1))\r\nin_seq2 = in_seq2.reshape((len(in_seq2), 1))\r\nout_seq = out_seq.reshape((len(out_seq), 1))\r\n# horizontally stack columns\r\ndataset = hstack((in_seq1, in_seq2, out_seq))\r\n# choose a number of time steps\r\nn_steps = 3\r\n# convert into input\/output\r\nX, y = split_sequences(dataset, n_steps)\r\n# the dataset knows the number of features, e.g. 2\r\nn_features = X.shape[2]\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))\r\nmodel.add(Dense(1))\r\nmodel.compile(optimizer='adam', loss='mse')\r\n# fit model\r\nmodel.fit(X, y, epochs=200, verbose=0)\r\n# demonstrate prediction\r\nx_input = array([[80, 85], [90, 95], [100, 105]])\r\nx_input = x_input.reshape((1, n_steps, n_features))\r\nyhat = model.predict(x_input, verbose=0)\r\nprint(yhat)<\/pre>\n<p>Running the example prepares the data, fits the model, and makes a prediction.<\/p>\n<pre class=\"crayon-plain-tag\">[[208.13531]]<\/pre>\n<\/p>\n<h2>Multiple Parallel Series<\/h2>\n<p>An alternate time series problem is the case where there are multiple parallel time series and a value must be predicted for each.<\/p>\n<p>For example, given the data from the previous section:<\/p>\n<pre class=\"crayon-plain-tag\">[[ 10  15  25]\r\n [ 20  25  45]\r\n [ 30  35  65]\r\n [ 40  45  85]\r\n [ 50  55 105]\r\n [ 60  65 125]\r\n [ 70  75 145]\r\n [ 80  85 165]\r\n [ 90  95 185]]<\/pre>\n<p>We may want to predict the value for each of the three time series for the next time step.<\/p>\n<p>This might be referred to as multivariate forecasting.<\/p>\n<p>Again, the data must be split into input\/output samples in order to train a model.<\/p>\n<p>The first sample of this dataset would be:<\/p>\n<p>Input:<\/p>\n<pre class=\"crayon-plain-tag\">10, 15, 25\r\n20, 25, 45\r\n30, 35, 65<\/pre>\n<p>Output:<\/p>\n<pre class=\"crayon-plain-tag\">40, 45, 85<\/pre>\n<p>The <em>split_sequences()<\/em> function below will split multiple parallel time series with rows for time steps and one series per column into the required input\/output shape.<\/p>\n<pre class=\"crayon-plain-tag\"># split a multivariate sequence into samples\r\ndef split_sequences(sequences, n_steps):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequences)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps\r\n\t\t# check if we are beyond the dataset\r\n\t\tif end_ix > len(sequences)-1:\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix, :]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)<\/pre>\n<p>We can demonstrate this on the contrived problem; the complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># multivariate output data prep\r\nfrom numpy import array\r\nfrom numpy import hstack\r\n\r\n# split a multivariate sequence into samples\r\ndef split_sequences(sequences, n_steps):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequences)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps\r\n\t\t# check if we are beyond the dataset\r\n\t\tif end_ix > len(sequences)-1:\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix, :]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nin_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])\r\nin_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])\r\nout_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])\r\n# convert to [rows, columns] structure\r\nin_seq1 = in_seq1.reshape((len(in_seq1), 1))\r\nin_seq2 = in_seq2.reshape((len(in_seq2), 1))\r\nout_seq = out_seq.reshape((len(out_seq), 1))\r\n# horizontally stack columns\r\ndataset = hstack((in_seq1, in_seq2, out_seq))\r\n# choose a number of time steps\r\nn_steps = 3\r\n# convert into input\/output\r\nX, y = split_sequences(dataset, n_steps)\r\nprint(X.shape, y.shape)\r\n# summarize the data\r\nfor i in range(len(X)):\r\n\tprint(X[i], y[i])<\/pre>\n<p>Running the example first prints the shape of the prepared X and y components.<\/p>\n<p>The shape of X is three-dimensional, including the number of samples (6), the number of time steps chosen per sample (3), and the number of parallel time series or features (3).<\/p>\n<p>The shape of y is two-dimensional as we might expect for the number of samples (6) and the number of time variables per sample to be predicted (3).<\/p>\n<p>The data is ready to use in an LSTM model that expects three-dimensional input and two-dimensional output shapes for the X and y components of each sample.<\/p>\n<p>Then, each of the samples is printed showing the input and output components of each sample.<\/p>\n<pre class=\"crayon-plain-tag\">(6, 3, 3) (6, 3)\r\n\r\n[[10 15 25]\r\n [20 25 45]\r\n [30 35 65]] [40 45 85]\r\n[[20 25 45]\r\n [30 35 65]\r\n [40 45 85]] [ 50  55 105]\r\n[[ 30  35  65]\r\n [ 40  45  85]\r\n [ 50  55 105]] [ 60  65 125]\r\n[[ 40  45  85]\r\n [ 50  55 105]\r\n [ 60  65 125]] [ 70  75 145]\r\n[[ 50  55 105]\r\n [ 60  65 125]\r\n [ 70  75 145]] [ 80  85 165]\r\n[[ 60  65 125]\r\n [ 70  75 145]\r\n [ 80  85 165]] [ 90  95 185]<\/pre>\n<p>We are now ready to fit an LSTM model on this data.<\/p>\n<p>Any of the varieties of LSTMs in the previous section can be used, such as a Vanilla, Stacked, Bidirectional, CNN, or ConvLSTM model.<\/p>\n<p>We will use a Stacked LSTM where the number of time steps and parallel series (features) are specified for the input layer via the <em>input_shape<\/em> argument. The number of parallel series is also used in the specification of the number of values to predict by the model in the output layer; again, this is three.<\/p>\n<pre class=\"crayon-plain-tag\"># define model\r\nmodel = Sequential()\r\nmodel.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))\r\nmodel.add(LSTM(100, activation='relu'))\r\nmodel.add(Dense(n_features))\r\nmodel.compile(optimizer='adam', loss='mse')<\/pre>\n<p>We can predict the next value in each of the three parallel series by providing an input of three time steps for each series.<\/p>\n<pre class=\"crayon-plain-tag\">70, 75, 145\r\n80, 85, 165\r\n90, 95, 185<\/pre>\n<p>The shape of the input for making a single prediction must be 1 sample, 3 time steps, and 3 features, or [1, 3, 3]<\/p>\n<pre class=\"crayon-plain-tag\"># demonstrate prediction\r\nx_input = array([[70,75,145], [80,85,165], [90,95,185]])\r\nx_input = x_input.reshape((1, n_steps, n_features))\r\nyhat = model.predict(x_input, verbose=0)<\/pre>\n<p>We would expect the vector output to be:<\/p>\n<pre class=\"crayon-plain-tag\">[100, 105, 205]<\/pre>\n<p>We can tie all of this together and demonstrate a Stacked LSTM for multivariate output time series forecasting below.<\/p>\n<pre class=\"crayon-plain-tag\"># multivariate output stacked lstm example\r\nfrom numpy import array\r\nfrom numpy import hstack\r\nfrom keras.models import Sequential\r\nfrom keras.layers import LSTM\r\nfrom keras.layers import Dense\r\n\r\n# split a multivariate sequence into samples\r\ndef split_sequences(sequences, n_steps):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequences)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps\r\n\t\t# check if we are beyond the dataset\r\n\t\tif end_ix > len(sequences)-1:\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix, :]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nin_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])\r\nin_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])\r\nout_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])\r\n# convert to [rows, columns] structure\r\nin_seq1 = in_seq1.reshape((len(in_seq1), 1))\r\nin_seq2 = in_seq2.reshape((len(in_seq2), 1))\r\nout_seq = out_seq.reshape((len(out_seq), 1))\r\n# horizontally stack columns\r\ndataset = hstack((in_seq1, in_seq2, out_seq))\r\n# choose a number of time steps\r\nn_steps = 3\r\n# convert into input\/output\r\nX, y = split_sequences(dataset, n_steps)\r\n# the dataset knows the number of features, e.g. 2\r\nn_features = X.shape[2]\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))\r\nmodel.add(LSTM(100, activation='relu'))\r\nmodel.add(Dense(n_features))\r\nmodel.compile(optimizer='adam', loss='mse')\r\n# fit model\r\nmodel.fit(X, y, epochs=400, verbose=0)\r\n# demonstrate prediction\r\nx_input = array([[70,75,145], [80,85,165], [90,95,185]])\r\nx_input = x_input.reshape((1, n_steps, n_features))\r\nyhat = model.predict(x_input, verbose=0)\r\nprint(yhat)<\/pre>\n<p>Running the example prepares the data, fits the model, and makes a prediction.<\/p>\n<pre class=\"crayon-plain-tag\">[[101.76599 108.730484 206.63577 ]]<\/pre>\n<\/p>\n<h2>Multi-Step LSTM Models<\/h2>\n<p>A time series forecasting problem that requires a prediction of multiple time steps into the future can be referred to as multi-step time series forecasting.<\/p>\n<p>Specifically, these are problems where the forecast horizon or interval is more than one time step.<\/p>\n<p>There are two main types of LSTM models that can be used for multi-step forecasting; they are:<\/p>\n<ol>\n<li>Vector Output Model<\/li>\n<li>Encoder-Decoder Model<\/li>\n<\/ol>\n<p>Before we look at these models, let\u2019s first look at the preparation of data for multi-step forecasting.<\/p>\n<h3>Data Preparation<\/h3>\n<p>As with one-step forecasting, a time series used for multi-step time series forecasting must be split into samples with input and output components.<\/p>\n<p>Both the input and output components will be comprised of multiple time steps and may or may not have the same number of steps.<\/p>\n<p>For example, given the univariate time series:<\/p>\n<pre class=\"crayon-plain-tag\">[10, 20, 30, 40, 50, 60, 70, 80, 90]<\/pre>\n<p>We could use the last three time steps as input and forecast the next two time steps.<\/p>\n<p>The first sample would look as follows:<\/p>\n<p>Input:<\/p>\n<pre class=\"crayon-plain-tag\">[10, 20, 30]<\/pre>\n<p>Output:<\/p>\n<pre class=\"crayon-plain-tag\">[40, 50]<\/pre>\n<p>The <em>split_sequence()<\/em> function below implements this behavior and will split a given univariate time series into samples with a specified number of input and output time steps.<\/p>\n<pre class=\"crayon-plain-tag\"># split a univariate sequence into samples\r\ndef split_sequence(sequence, n_steps_in, n_steps_out):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequence)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps_in\r\n\t\tout_end_ix = end_ix + n_steps_out\r\n\t\t# check if we are beyond the sequence\r\n\t\tif out_end_ix > len(sequence):\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)<\/pre>\n<p>We can demonstrate this function on the small contrived dataset.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># multi-step data preparation\r\nfrom numpy import array\r\n\r\n# split a univariate sequence into samples\r\ndef split_sequence(sequence, n_steps_in, n_steps_out):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequence)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps_in\r\n\t\tout_end_ix = end_ix + n_steps_out\r\n\t\t# check if we are beyond the sequence\r\n\t\tif out_end_ix > len(sequence):\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nraw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]\r\n# choose a number of time steps\r\nn_steps_in, n_steps_out = 3, 2\r\n# split into samples\r\nX, y = split_sequence(raw_seq, n_steps_in, n_steps_out)\r\n# summarize the data\r\nfor i in range(len(X)):\r\n\tprint(X[i], y[i])<\/pre>\n<p>Running the example splits the univariate series into input and output time steps and prints the input and output components of each.<\/p>\n<pre class=\"crayon-plain-tag\">[10 20 30] [40 50]\r\n[20 30 40] [50 60]\r\n[30 40 50] [60 70]\r\n[40 50 60] [70 80]\r\n[50 60 70] [80 90]<\/pre>\n<p>Now that we know how to prepare data for multi-step forecasting, let\u2019s look at some LSTM models that can learn this mapping.<\/p>\n<h3>Vector Output Model<\/h3>\n<p>Like other types of neural network models, the LSTM can output a vector directly that can be interpreted as a multi-step forecast.<\/p>\n<p>This approach was seen in the previous section were one time step of each output time series was forecasted as a vector.<\/p>\n<p>As with the LSTMs for univariate data in a prior section, the prepared samples must first be reshaped. The LSTM expects data to have a three-dimensional structure of [<em>samples, timesteps, features<\/em>], and in this case, we only have one feature so the reshape is straightforward.<\/p>\n<pre class=\"crayon-plain-tag\"># reshape from [samples, timesteps] into [samples, timesteps, features]\r\nn_features = 1\r\nX = X.reshape((X.shape[0], X.shape[1], n_features))<\/pre>\n<p>With the number of input and output steps specified in the <em>n_steps_in<\/em> and <em>n_steps_out<\/em> variables, we can define a multi-step time-series forecasting model.<\/p>\n<p>Any of the presented LSTM model types could be used, such as Vanilla, Stacked, Bidirectional, CNN-LSTM, or ConvLSTM. Below defines a Stacked LSTM for multi-step forecasting.<\/p>\n<pre class=\"crayon-plain-tag\"># define model\r\nmodel = Sequential()\r\nmodel.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps_in, n_features)))\r\nmodel.add(LSTM(100, activation='relu'))\r\nmodel.add(Dense(n_steps_out))\r\nmodel.compile(optimizer='adam', loss='mse')<\/pre>\n<p>The model can make a prediction for a single sample. We can predict the next two steps beyond the end of the dataset by providing the input:<\/p>\n<pre class=\"crayon-plain-tag\">[70, 80, 90]<\/pre>\n<p>We would expect the predicted output to be:<\/p>\n<pre class=\"crayon-plain-tag\">[100, 110]<\/pre>\n<p>As expected by the model, the shape of the single sample of input data when making the prediction must be [1, 3, 1] for the 1 sample, 3 time steps of the input, and the single feature.<\/p>\n<pre class=\"crayon-plain-tag\"># demonstrate prediction\r\nx_input = array([70, 80, 90])\r\nx_input = x_input.reshape((1, n_steps_in, n_features))\r\nyhat = model.predict(x_input, verbose=0)<\/pre>\n<p>Tying all of this together, the Stacked LSTM for multi-step forecasting with a univariate time series is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># univariate multi-step vector-output stacked lstm example\r\nfrom numpy import array\r\nfrom keras.models import Sequential\r\nfrom keras.layers import LSTM\r\nfrom keras.layers import Dense\r\n\r\n# split a univariate sequence into samples\r\ndef split_sequence(sequence, n_steps_in, n_steps_out):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequence)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps_in\r\n\t\tout_end_ix = end_ix + n_steps_out\r\n\t\t# check if we are beyond the sequence\r\n\t\tif out_end_ix > len(sequence):\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nraw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]\r\n# choose a number of time steps\r\nn_steps_in, n_steps_out = 3, 2\r\n# split into samples\r\nX, y = split_sequence(raw_seq, n_steps_in, n_steps_out)\r\n# reshape from [samples, timesteps] into [samples, timesteps, features]\r\nn_features = 1\r\nX = X.reshape((X.shape[0], X.shape[1], n_features))\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps_in, n_features)))\r\nmodel.add(LSTM(100, activation='relu'))\r\nmodel.add(Dense(n_steps_out))\r\nmodel.compile(optimizer='adam', loss='mse')\r\n# fit model\r\nmodel.fit(X, y, epochs=50, verbose=0)\r\n# demonstrate prediction\r\nx_input = array([70, 80, 90])\r\nx_input = x_input.reshape((1, n_steps_in, n_features))\r\nyhat = model.predict(x_input, verbose=0)\r\nprint(yhat)<\/pre>\n<p>Running the example forecasts and prints the next two time steps in the sequence.<\/p>\n<pre class=\"crayon-plain-tag\">[[100.98096 113.28924]]<\/pre>\n<\/p>\n<h3>Encoder-Decoder Model<\/h3>\n<p>A model specifically developed for forecasting variable length output sequences is called the <a href=\"https:\/\/machinelearningmastery.com\/encoder-decoder-long-short-term-memory-networks\/\">Encoder-Decoder LSTM<\/a>.<\/p>\n<p>The model was designed for prediction problems where there are both input and output sequences, so-called sequence-to-sequence, or seq2seq problems, such as translating text from one language to another.<\/p>\n<p>This model can be used for multi-step time series forecasting.<\/p>\n<p>As its name suggests, the model is comprised of two sub-models: the encoder and the decoder.<\/p>\n<p>The encoder is a model responsible for reading and interpreting the input sequence. The output of the encoder is a fixed length vector that represents the model\u2019s interpretation of the sequence. The encoder is traditionally a Vanilla LSTM model, although other encoder models can be used such as Stacked, Bidirectional, and CNN models.<\/p>\n<pre class=\"crayon-plain-tag\">model.add(LSTM(100, activation='relu', input_shape=(n_steps_in, n_features)))<\/pre>\n<p>The decoder uses the output of the encoder as an input.<\/p>\n<p>First, the fixed-length output of the encoder is repeated, once for each required time step in the output sequence.<\/p>\n<pre class=\"crayon-plain-tag\">model.add(RepeatVector(n_steps_out))<\/pre>\n<p>This sequence is then provided to an LSTM decoder model. The model must output a value for each value in the output time step, which can be interpreted by a single output model.<\/p>\n<pre class=\"crayon-plain-tag\">model.add(LSTM(100, activation='relu', return_sequences=True))<\/pre>\n<p>We can use the same output layer or layers to make each one-step prediction in the output sequence. This can be achieved by wrapping the output part of the model in a <a href=\"https:\/\/machinelearningmastery.com\/timedistributed-layer-for-long-short-term-memory-networks-in-python\/\">TimeDistributed wrapper<\/a>.<\/p>\n<pre class=\"crayon-plain-tag\">model.add(TimeDistributed(Dense(1)))<\/pre>\n<p>The full definition for an Encoder-Decoder model for multi-step time series forecasting is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># define model\r\nmodel = Sequential()\r\nmodel.add(LSTM(100, activation='relu', input_shape=(n_steps_in, n_features)))\r\nmodel.add(RepeatVector(n_steps_out))\r\nmodel.add(LSTM(100, activation='relu', return_sequences=True))\r\nmodel.add(TimeDistributed(Dense(1)))\r\nmodel.compile(optimizer='adam', loss='mse')<\/pre>\n<p>As with other LSTM models, the input data must be reshaped into the expected three-dimensional shape of [<em>samples, timesteps, features<\/em>].<\/p>\n<pre class=\"crayon-plain-tag\">X = X.reshape((X.shape[0], X.shape[1], n_features))<\/pre>\n<p>In the case of the Encoder-Decoder model, the output, or y part, of the training dataset must also have this shape. This is because the model will predict a given number of time steps with a given number of features for each input sample.<\/p>\n<pre class=\"crayon-plain-tag\">y = y.reshape((y.shape[0], y.shape[1], n_features))<\/pre>\n<p>The complete example of an Encoder-Decoder LSTM for multi-step time series forecasting is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># univariate multi-step encoder-decoder lstm example\r\nfrom numpy import array\r\nfrom keras.models import Sequential\r\nfrom keras.layers import LSTM\r\nfrom keras.layers import Dense\r\nfrom keras.layers import RepeatVector\r\nfrom keras.layers import TimeDistributed\r\n\r\n# split a univariate sequence into samples\r\ndef split_sequence(sequence, n_steps_in, n_steps_out):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequence)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps_in\r\n\t\tout_end_ix = end_ix + n_steps_out\r\n\t\t# check if we are beyond the sequence\r\n\t\tif out_end_ix > len(sequence):\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nraw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]\r\n# choose a number of time steps\r\nn_steps_in, n_steps_out = 3, 2\r\n# split into samples\r\nX, y = split_sequence(raw_seq, n_steps_in, n_steps_out)\r\n# reshape from [samples, timesteps] into [samples, timesteps, features]\r\nn_features = 1\r\nX = X.reshape((X.shape[0], X.shape[1], n_features))\r\ny = y.reshape((y.shape[0], y.shape[1], n_features))\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(LSTM(100, activation='relu', input_shape=(n_steps_in, n_features)))\r\nmodel.add(RepeatVector(n_steps_out))\r\nmodel.add(LSTM(100, activation='relu', return_sequences=True))\r\nmodel.add(TimeDistributed(Dense(1)))\r\nmodel.compile(optimizer='adam', loss='mse')\r\n# fit model\r\nmodel.fit(X, y, epochs=100, verbose=0)\r\n# demonstrate prediction\r\nx_input = array([70, 80, 90])\r\nx_input = x_input.reshape((1, n_steps_in, n_features))\r\nyhat = model.predict(x_input, verbose=0)\r\nprint(yhat)<\/pre>\n<p>Running the example forecasts and prints the next two time steps in the sequence.<\/p>\n<pre class=\"crayon-plain-tag\">[[[101.9736  \r\n  [116.213615]]]<\/pre>\n<\/p>\n<h2>Multivariate Multi-Step LSTM Models<\/h2>\n<p>In the previous sections, we have looked at univariate, multivariate, and multi-step time series forecasting.<\/p>\n<p>It is possible to mix and match the different types of LSTM models presented so far for the different problems. This too applies to time series forecasting problems that involve multivariate and multi-step forecasting, but it may be a little more challenging.<\/p>\n<p>In this section, we will provide short examples of data preparation and modeling for multivariate multi-step time series forecasting as a template to ease this challenge, specifically:<\/p>\n<ol>\n<li>Multiple Input Multi-Step Output.<\/li>\n<li>Multiple Parallel Input and Multi-Step Output.<\/li>\n<\/ol>\n<p>Perhaps the biggest stumbling block is in the preparation of data, so this is where we will focus our attention.<\/p>\n<h3>Multiple Input Multi-Step Output<\/h3>\n<p>There are those multivariate time series forecasting problems where the output series is separate but dependent upon the input time series, and multiple time steps are required for the output series.<\/p>\n<p>For example, consider our multivariate time series from a prior section:<\/p>\n<pre class=\"crayon-plain-tag\">[[ 10  15  25]\r\n [ 20  25  45]\r\n [ 30  35  65]\r\n [ 40  45  85]\r\n [ 50  55 105]\r\n [ 60  65 125]\r\n [ 70  75 145]\r\n [ 80  85 165]\r\n [ 90  95 185]]<\/pre>\n<p>We may use three prior time steps of each of the two input time series to predict two time steps of the output time series.<\/p>\n<p>Input:<\/p>\n<pre class=\"crayon-plain-tag\">10, 15\r\n20, 25\r\n30, 35<\/pre>\n<p>Output:<\/p>\n<pre class=\"crayon-plain-tag\">65\r\n85<\/pre>\n<p>The <em>split_sequences()<\/em> function below implements this behavior.<\/p>\n<pre class=\"crayon-plain-tag\"># split a multivariate sequence into samples\r\ndef split_sequences(sequences, n_steps_in, n_steps_out):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequences)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps_in\r\n\t\tout_end_ix = end_ix + n_steps_out-1\r\n\t\t# check if we are beyond the dataset\r\n\t\tif out_end_ix > len(sequences):\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1:out_end_ix, -1]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)<\/pre>\n<p>We can demonstrate this on our contrived dataset.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># multivariate multi-step data preparation\r\nfrom numpy import array\r\nfrom numpy import hstack\r\n\r\n# split a multivariate sequence into samples\r\ndef split_sequences(sequences, n_steps_in, n_steps_out):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequences)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps_in\r\n\t\tout_end_ix = end_ix + n_steps_out-1\r\n\t\t# check if we are beyond the dataset\r\n\t\tif out_end_ix > len(sequences):\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1:out_end_ix, -1]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nin_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])\r\nin_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])\r\nout_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])\r\n# convert to [rows, columns] structure\r\nin_seq1 = in_seq1.reshape((len(in_seq1), 1))\r\nin_seq2 = in_seq2.reshape((len(in_seq2), 1))\r\nout_seq = out_seq.reshape((len(out_seq), 1))\r\n# horizontally stack columns\r\ndataset = hstack((in_seq1, in_seq2, out_seq))\r\n# choose a number of time steps\r\nn_steps_in, n_steps_out = 3, 2\r\n# covert into input\/output\r\nX, y = split_sequences(dataset, n_steps_in, n_steps_out)\r\nprint(X.shape, y.shape)\r\n# summarize the data\r\nfor i in range(len(X)):\r\n\tprint(X[i], y[i])<\/pre>\n<p>Running the example first prints the shape of the prepared training data.<\/p>\n<p>We can see that the shape of the input portion of the samples is three-dimensional, comprised of six samples, with three time steps, and two variables for the 2 input time series.<\/p>\n<p>The output portion of the samples is two-dimensional for the six samples and the two time steps for each sample to be predicted.<\/p>\n<p>The prepared samples are then printed to confirm that the data was prepared as we specified.<\/p>\n<pre class=\"crayon-plain-tag\">(6, 3, 2) (6, 2)\r\n\r\n[[10 15]\r\n [20 25]\r\n [30 35]] [65 85]\r\n[[20 25]\r\n [30 35]\r\n [40 45]] [ 85 105]\r\n[[30 35]\r\n [40 45]\r\n [50 55]] [105 125]\r\n[[40 45]\r\n [50 55]\r\n [60 65]] [125 145]\r\n[[50 55]\r\n [60 65]\r\n [70 75]] [145 165]\r\n[[60 65]\r\n [70 75]\r\n [80 85]] [165 185]<\/pre>\n<p>We can now develop an LSTM model for multi-step predictions.<\/p>\n<p>A vector output or an encoder-decoder model could be used. In this case, we will demonstrate a vector output with a Stacked LSTM.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># multivariate multi-step stacked lstm example\r\nfrom numpy import array\r\nfrom numpy import hstack\r\nfrom keras.models import Sequential\r\nfrom keras.layers import LSTM\r\nfrom keras.layers import Dense\r\n\r\n# split a multivariate sequence into samples\r\ndef split_sequences(sequences, n_steps_in, n_steps_out):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequences)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps_in\r\n\t\tout_end_ix = end_ix + n_steps_out-1\r\n\t\t# check if we are beyond the dataset\r\n\t\tif out_end_ix > len(sequences):\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1:out_end_ix, -1]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nin_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])\r\nin_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])\r\nout_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])\r\n# convert to [rows, columns] structure\r\nin_seq1 = in_seq1.reshape((len(in_seq1), 1))\r\nin_seq2 = in_seq2.reshape((len(in_seq2), 1))\r\nout_seq = out_seq.reshape((len(out_seq), 1))\r\n# horizontally stack columns\r\ndataset = hstack((in_seq1, in_seq2, out_seq))\r\n# choose a number of time steps\r\nn_steps_in, n_steps_out = 3, 2\r\n# covert into input\/output\r\nX, y = split_sequences(dataset, n_steps_in, n_steps_out)\r\n# the dataset knows the number of features, e.g. 2\r\nn_features = X.shape[2]\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps_in, n_features)))\r\nmodel.add(LSTM(100, activation='relu'))\r\nmodel.add(Dense(n_steps_out))\r\nmodel.compile(optimizer='adam', loss='mse')\r\n# fit model\r\nmodel.fit(X, y, epochs=200, verbose=0)\r\n# demonstrate prediction\r\nx_input = array([[70, 75], [80, 85], [90, 95]])\r\nx_input = x_input.reshape((1, n_steps_in, n_features))\r\nyhat = model.predict(x_input, verbose=0)\r\nprint(yhat)<\/pre>\n<p>Running the example fits the model and predicts the next two time steps of the output sequence beyond the dataset.<\/p>\n<p>We would expect the next two steps to be: [185, 205]<\/p>\n<p>It is a challenging framing of the problem with very little data, and the arbitrarily configured version of the model gets close.<\/p>\n<pre class=\"crayon-plain-tag\">[[188.70619 210.16513]]<\/pre>\n<\/p>\n<h3>Multiple Parallel Input and Multi-Step Output<\/h3>\n<p>A problem with parallel time series may require the prediction of multiple time steps of each time series.<\/p>\n<p>For example, consider our multivariate time series from a prior section:<\/p>\n<pre class=\"crayon-plain-tag\">[[ 10  15  25]\r\n [ 20  25  45]\r\n [ 30  35  65]\r\n [ 40  45  85]\r\n [ 50  55 105]\r\n [ 60  65 125]\r\n [ 70  75 145]\r\n [ 80  85 165]\r\n [ 90  95 185]]<\/pre>\n<p>We may use the last three time steps from each of the three time series as input to the model and predict the next time steps of each of the three time series as output.<\/p>\n<p>The first sample in the training dataset would be the following.<\/p>\n<p>Input:<\/p>\n<pre class=\"crayon-plain-tag\">10, 15, 25\r\n20, 25, 45\r\n30, 35, 65<\/pre>\n<p>Output:<\/p>\n<pre class=\"crayon-plain-tag\">40, 45, 85\r\n50, 55, 105<\/pre>\n<p>The <em>split_sequences()<\/em> function below implements this behavior.<\/p>\n<pre class=\"crayon-plain-tag\"># split a multivariate sequence into samples\r\ndef split_sequences(sequences, n_steps_in, n_steps_out):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequences)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps_in\r\n\t\tout_end_ix = end_ix + n_steps_out\r\n\t\t# check if we are beyond the dataset\r\n\t\tif out_end_ix > len(sequences):\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix:out_end_ix, :]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)<\/pre>\n<p>We can demonstrate this function on the small contrived dataset.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># multivariate multi-step data preparation\r\nfrom numpy import array\r\nfrom numpy import hstack\r\nfrom keras.models import Sequential\r\nfrom keras.layers import LSTM\r\nfrom keras.layers import Dense\r\nfrom keras.layers import RepeatVector\r\nfrom keras.layers import TimeDistributed\r\n\r\n# split a multivariate sequence into samples\r\ndef split_sequences(sequences, n_steps_in, n_steps_out):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequences)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps_in\r\n\t\tout_end_ix = end_ix + n_steps_out\r\n\t\t# check if we are beyond the dataset\r\n\t\tif out_end_ix > len(sequences):\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix:out_end_ix, :]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nin_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])\r\nin_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])\r\nout_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])\r\n# convert to [rows, columns] structure\r\nin_seq1 = in_seq1.reshape((len(in_seq1), 1))\r\nin_seq2 = in_seq2.reshape((len(in_seq2), 1))\r\nout_seq = out_seq.reshape((len(out_seq), 1))\r\n# horizontally stack columns\r\ndataset = hstack((in_seq1, in_seq2, out_seq))\r\n# choose a number of time steps\r\nn_steps_in, n_steps_out = 3, 2\r\n# covert into input\/output\r\nX, y = split_sequences(dataset, n_steps_in, n_steps_out)\r\nprint(X.shape, y.shape)\r\n# summarize the data\r\nfor i in range(len(X)):\r\n\tprint(X[i], y[i])<\/pre>\n<p>Running the example first prints the shape of the prepared training dataset.<\/p>\n<p>We can see that both the input (X) and output (Y) elements of the dataset are three dimensional for the number of samples, time steps, and variables or parallel time series respectively.<\/p>\n<p>The input and output elements of each series are then printed side by side so that we can confirm that the data was prepared as we expected.<\/p>\n<pre class=\"crayon-plain-tag\">(5, 3, 3) (5, 2, 3)\r\n\r\n[[10 15 25]\r\n [20 25 45]\r\n [30 35 65]] [[ 40  45  85]\r\n [ 50  55 105]]\r\n[[20 25 45]\r\n [30 35 65]\r\n [40 45 85]] [[ 50  55 105]\r\n [ 60  65 125]]\r\n[[ 30  35  65]\r\n [ 40  45  85]\r\n [ 50  55 105]] [[ 60  65 125]\r\n [ 70  75 145]]\r\n[[ 40  45  85]\r\n [ 50  55 105]\r\n [ 60  65 125]] [[ 70  75 145]\r\n [ 80  85 165]]\r\n[[ 50  55 105]\r\n [ 60  65 125]\r\n [ 70  75 145]] [[ 80  85 165]\r\n [ 90  95 185]]<\/pre>\n<p>We can use either the Vector Output or Encoder-Decoder LSTM to model this problem. In this case, we will use the Encoder-Decoder model.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># multivariate multi-step encoder-decoder lstm example\r\nfrom numpy import array\r\nfrom numpy import hstack\r\nfrom keras.models import Sequential\r\nfrom keras.layers import LSTM\r\nfrom keras.layers import Dense\r\nfrom keras.layers import RepeatVector\r\nfrom keras.layers import TimeDistributed\r\n\r\n# split a multivariate sequence into samples\r\ndef split_sequences(sequences, n_steps_in, n_steps_out):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequences)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps_in\r\n\t\tout_end_ix = end_ix + n_steps_out\r\n\t\t# check if we are beyond the dataset\r\n\t\tif out_end_ix > len(sequences):\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix:out_end_ix, :]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn array(X), array(y)\r\n\r\n# define input sequence\r\nin_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])\r\nin_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])\r\nout_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])\r\n# convert to [rows, columns] structure\r\nin_seq1 = in_seq1.reshape((len(in_seq1), 1))\r\nin_seq2 = in_seq2.reshape((len(in_seq2), 1))\r\nout_seq = out_seq.reshape((len(out_seq), 1))\r\n# horizontally stack columns\r\ndataset = hstack((in_seq1, in_seq2, out_seq))\r\n# choose a number of time steps\r\nn_steps_in, n_steps_out = 3, 2\r\n# covert into input\/output\r\nX, y = split_sequences(dataset, n_steps_in, n_steps_out)\r\n# the dataset knows the number of features, e.g. 2\r\nn_features = X.shape[2]\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(LSTM(200, activation='relu', input_shape=(n_steps_in, n_features)))\r\nmodel.add(RepeatVector(n_steps_out))\r\nmodel.add(LSTM(200, activation='relu', return_sequences=True))\r\nmodel.add(TimeDistributed(Dense(n_features)))\r\nmodel.compile(optimizer='adam', loss='mse')\r\n# fit model\r\nmodel.fit(X, y, epochs=300, verbose=0)\r\n# demonstrate prediction\r\nx_input = array([[60, 65, 125], [70, 75, 145], [80, 85, 165]])\r\nx_input = x_input.reshape((1, n_steps_in, n_features))\r\nyhat = model.predict(x_input, verbose=0)\r\nprint(yhat)<\/pre>\n<p>Running the example fits the model and predicts the values for each of the three time steps for the next two time steps beyond the end of the dataset.<\/p>\n<p>We would expect the values for these series and time steps to be as follows:<\/p>\n<pre class=\"crayon-plain-tag\">90, 95, 185\r\n100, 105, 205<\/pre>\n<p>We can see that the model forecast gets reasonably close to the expected values.<\/p>\n<pre class=\"crayon-plain-tag\">[[[ 91.86044   97.77231  189.66768 ]\r\n  [103.299355 109.18123  212.6863  ]]]<\/pre>\n<\/p>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to develop a suite of LSTM models for a range of standard time series forecasting problems.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>How to develop LSTM models for univariate time series forecasting.<\/li>\n<li>How to develop LSTM models for multivariate time series forecasting.<\/li>\n<li>How to develop LSTM models for multi-step time series forecasting.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/how-to-develop-lstm-models-for-time-series-forecasting\/\">How to Develop LSTM Models for Time Series Forecasting<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/how-to-develop-lstm-models-for-time-series-forecasting\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Long Short-Term Memory networks, or LSTMs for short, can be applied to time series forecasting. There are many types of LSTM models [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2018\/11\/13\/how-to-develop-lstm-models-for-time-series-forecasting\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":1290,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1289"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=1289"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1289\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/1290"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=1289"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=1289"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=1289"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}