{"id":1212,"date":"2018-10-25T18:00:24","date_gmt":"2018-10-25T18:00:24","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2018\/10\/25\/how-to-grid-search-naive-methods-for-univariate-time-series-forecasting\/"},"modified":"2018-10-25T18:00:24","modified_gmt":"2018-10-25T18:00:24","slug":"how-to-grid-search-naive-methods-for-univariate-time-series-forecasting","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2018\/10\/25\/how-to-grid-search-naive-methods-for-univariate-time-series-forecasting\/","title":{"rendered":"How to Grid Search Naive Methods for Univariate Time Series Forecasting"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Simple forecasting methods include naively using the last observation as the prediction or an average of prior observations.<\/p>\n<p>It is important to evaluate the performance of simple forecasting methods on univariate time series forecasting problems before using more sophisticated methods as their performance provides a lower-bound and point of comparison that can be used to determine of a model has skill or not for a given problem.<\/p>\n<p>Although simple, methods such as the naive and average forecast strategies can be tuned to a specific problem in terms of the choice of which prior observation to persist or how many prior observations to average. Often, tuning the hyperparameters of these simple strategies can provide a more robust and defensible lower bound on model performance, as well as surprising results that may inform the choice and configuration of more sophisticated methods.<\/p>\n<p>In this tutorial, you will discover how to develop a framework from scratch for grid searching simple naive and averaging strategies for time series forecasting with univariate data.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>How to develop a framework for grid searching simple models from scratch using walk-forward validation.<\/li>\n<li>How to grid search simple model hyperparameters for daily time series data for births.<\/li>\n<li>How to grid search simple model hyperparameters for monthly time series data for shampoo sales, car sales, and temperature.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_6363\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-6363 size-full\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/10\/How-to-Grid-Search-Naive-Methods-for-Univariate-Time-Series-Forecasting.jpg\" alt=\"How to Grid Search Naive Methods for Univariate Time Series Forecasting\" width=\"640\" height=\"480\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/10\/How-to-Grid-Search-Naive-Methods-for-Univariate-Time-Series-Forecasting.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/10\/How-to-Grid-Search-Naive-Methods-for-Univariate-Time-Series-Forecasting-300x225.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p class=\"wp-caption-text\">How to Grid Search Naive Methods for Univariate Time Series Forecasting<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/robandstephanielevy\/526862866\/\">Rob and Stephanie Levy<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into six parts; they are:<\/p>\n<ol>\n<li>Simple Forecasting Strategies<\/li>\n<li>Develop a Grid Search Framework<\/li>\n<li>Case Study 1: No Trend or Seasonality<\/li>\n<li>Case Study 2: Trend<\/li>\n<li>Case Study 3: Seasonality<\/li>\n<li>Case Study 4: Trend and Seasonality<\/li>\n<\/ol>\n<h2>Simple Forecasting Strategies<\/h2>\n<p>It is important and useful to test simple forecast strategies prior to testing more complex models.<\/p>\n<p>Simple forecast strategies are those that assume little or nothing about the nature of the forecast problem and are fast to implement and calculate.<\/p>\n<p>The results can be used as a baseline in performance and used as a point of a comparison. If a model can perform better than the performance of a simple forecast strategy, then it can be said to be skillful.<\/p>\n<p>There are two main themes to simple forecast strategies; they are:<\/p>\n<ul>\n<li><strong>Naive<\/strong>, or using observations values directly.<\/li>\n<li><strong>Average<\/strong>, or using a statistic calculated on previous observations.<\/li>\n<\/ul>\n<p>Let\u2019s take a closer look at both of these strategies.<\/p>\n<h3>Naive Forecasting Strategy<\/h3>\n<p>A naive forecast involves using the previous observation directly as the forecast without any change.<\/p>\n<p>It is often called the persistence forecast as the prior observation is persisted.<\/p>\n<p>This simple approach can be adjusted slightly for seasonal data. In this case, the observation at the same time in the previous cycle may be persisted instead.<\/p>\n<p>This can be further generalized to testing each possible offset into the historical data that could be used to persist a value for a forecast.<\/p>\n<p>For example, given the series:<\/p>\n<pre class=\"crayon-plain-tag\">[1, 2, 3, 4, 5, 6, 7, 8, 9]<\/pre>\n<p>We could persist the last observation (relative index -1) as the value 9 or persist the second last prior observation (relative index -2) as 8, and so on.<\/p>\n<h3>Average Forecast Strategy<\/h3>\n<p>One step above the naive forecast is the strategy of averaging prior values.<\/p>\n<p>All prior observations are collected and averaged, either using the mean or the median, with no other treatment to the data.<\/p>\n<p>In some cases, we may want to shorten the history used in the average calculation to the last few observations.<\/p>\n<p>We can generalize this to the case of testing each possible set of n-prior observations to be included into the average calculation.<\/p>\n<p>For example, given the series:<\/p>\n<pre class=\"crayon-plain-tag\">[1, 2, 3, 4, 5, 6, 7, 8, 9]<\/pre>\n<p>We could average the last one observation (9), the last two observations (8, 9), and so on.<\/p>\n<p>In the case of seasonal data, we may want to average the last n-prior observations at the same time in the cycle as the time that is being forecasted.<\/p>\n<p>For example, given the series with a 3-step cycle:<\/p>\n<pre class=\"crayon-plain-tag\">[1, 2, 3, 1, 2, 3, 1, 2, 3]<\/pre>\n<p>We could use a window size of 3 and average the last one observation (-3 or 1), the last two observations (-3 or 1, and -(3 * 2) or 1), and so on.<\/p>\n<\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<p><center><\/p>\n<h3>Need help with Deep Learning for Time Series?<\/h3>\n<p>Take my free 7-day email crash course now (with sample code).<\/p>\n<p>Click to sign-up and also get a free PDF Ebook version of the course.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/14531ee73f72a2%3A164f8be4f346dc\/5630742793027584\/\" target=\"_blank\" style=\"background: rgb(255, 206, 10); color: rgb(255, 255, 255); text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-weight: bold; font-size: 16px; line-height: 20px; padding: 10px; display: inline-block; max-width: 300px; border-radius: 5px; text-shadow: rgba(0, 0, 0, 0.25) 0px -1px 1px; box-shadow: rgba(255, 255, 255, 0.5) 0px 1px 3px inset, rgba(0, 0, 0, 0.5) 0px 1px 3px;\">Download Your FREE Mini-Course<\/a><script data-leadbox=\"14531ee73f72a2:164f8be4f346dc\" data-url=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/14531ee73f72a2%3A164f8be4f346dc\/5630742793027584\/\" data-config=\"%7B%7D\" type=\"text\/javascript\" src=\"https:\/\/machinelearningmastery.lpages.co\/leadbox-1534880695.js\"><\/script><\/p>\n<p><\/center><\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<h2>Develop a Grid Search Framework<\/h2>\n<p>In this section, we will develop a framework for grid searching the two simple forecast strategies described in the previous section, namely the naive and average strategies.<\/p>\n<p>We can start off by implementing a naive forecast strategy.<\/p>\n<p>For a given dataset of historical observations, we can persist any value in that history, that is from the previous observation at index -1 to the first observation in the history at -(len(data)).<\/p>\n<p>The <em>naive_forecast()<\/em> function below implements the naive forecast strategy for a given offset from 1 to the length of the dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># one-step naive forecast\r\ndef naive_forecast(history, n):\r\n\treturn history[-n]<\/pre>\n<p>We can test this function out on a small contrived dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># one-step naive forecast\r\ndef naive_forecast(history, n):\r\n\treturn history[-n]\r\n\r\n# define dataset\r\ndata = [10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]\r\nprint(data)\r\n# test naive forecast\r\nfor i in range(1, len(data)+1):\r\n\tprint(naive_forecast(data, i))<\/pre>\n<p>Running the example first prints the contrived dataset, then the naive forecast for each offset in the historical dataset.<\/p>\n<pre class=\"crayon-plain-tag\">[10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]\r\n100.0\r\n90.0\r\n80.0\r\n70.0\r\n60.0\r\n50.0\r\n40.0\r\n30.0\r\n20.0\r\n10.0<\/pre>\n<p>We can now look at developing a function for the average forecast strategy.<\/p>\n<p>Averaging the last n observations is straight-forward; for example:<\/p>\n<pre class=\"crayon-plain-tag\">from numpy import mean\r\nresult = mean(history[-n:])<\/pre>\n<p>We may also want to test out the median in those cases where the distribution of observations is non-Gaussian.<\/p>\n<pre class=\"crayon-plain-tag\">from numpy import median\r\nresult = median(history[-n:])<\/pre>\n<p>The <em>average_forecast()<\/em> function below implements this taking the historical data and a config array or tuple that specifies the number of prior values to average as an integer, and a string that describe the way to calculate the average (\u2018<em>mean<\/em>\u2018 or \u2018<em>median<\/em>\u2018).<\/p>\n<pre class=\"crayon-plain-tag\"># one-step average forecast\r\ndef average_forecast(history, config):\r\n\tn, avg_type = config\r\n\t# mean of last n values\r\n\tif avg_type is 'mean':\r\n\t\treturn mean(history[-n:])\r\n\t# median of last n values\r\n\treturn median(history[-n:])<\/pre>\n<p>The complete example on a small contrived dataset is listed below.<\/p>\n<pre class=\"crayon-plain-tag\">from numpy import mean\r\nfrom numpy import median\r\n\r\n# one-step average forecast\r\ndef average_forecast(history, config):\r\n\tn, avg_type = config\r\n\t# mean of last n values\r\n\tif avg_type is 'mean':\r\n\t\treturn mean(history[-n:])\r\n\t# median of last n values\r\n\treturn median(history[-n:])\r\n\r\n# define dataset\r\ndata = [10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]\r\nprint(data)\r\n# test naive forecast\r\nfor i in range(1, len(data)+1):\r\n\tprint(average_forecast(data, (i, 'mean')))<\/pre>\n<p>Running the example forecasts the next value in the series as the mean value from contiguous subsets of prior observations from -1 to -10, inclusively.<\/p>\n<pre class=\"crayon-plain-tag\">[10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]\r\n100.0\r\n95.0\r\n90.0\r\n85.0\r\n80.0\r\n75.0\r\n70.0\r\n65.0\r\n60.0\r\n55.0<\/pre>\n<p>We can update the function to support averaging over seasonal data, respecting the seasonal offset.<\/p>\n<p>An offset argument can be added to the function that when not set to 1 will determine the number of prior observations backwards to count before collecting values from which to include in the average.<\/p>\n<p>For example, if n=1 and offset=3, then the average is calculated from the single value at n*offset or 1*3 = -3. If n=2 and offset=3, then the average is calculated from the values at 1*3 or -3 and 2*3 or -6.<\/p>\n<p>We can also add some protection to raise an exception when a seasonal configuration (n * offset) extends beyond the end of the historical observations.<\/p>\n<p>The updated function is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># one-step average forecast\r\ndef average_forecast(history, config):\r\n\tn, offset, avg_type = config\r\n\tvalues = list()\r\n\tif offset == 1:\r\n\t\tvalues = history[-n:]\r\n\telse:\r\n\t\t# skip bad configs\r\n\t\tif n*offset > len(history):\r\n\t\t\traise Exception('Config beyond end of data: %d %d' % (n,offset))\r\n\t\t# try and collect n values using offset\r\n\t\tfor i in range(1, n+1):\r\n\t\t\tix = i * offset\r\n\t\t\tvalues.append(history[-ix])\r\n\t# mean of last n values\r\n\tif avg_type is 'mean':\r\n\t\treturn mean(values)\r\n\t# median of last n values\r\n\treturn median(values)<\/pre>\n<p>We can test out this function on a small contrived dataset with a seasonal cycle.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\">from numpy import mean\r\nfrom numpy import median\r\n\r\n# one-step average forecast\r\ndef average_forecast(history, config):\r\n\tn, offset, avg_type = config\r\n\tvalues = list()\r\n\tif offset == 1:\r\n\t\tvalues = history[-n:]\r\n\telse:\r\n\t\t# skip bad configs\r\n\t\tif n*offset > len(history):\r\n\t\t\traise Exception('Config beyond end of data: %d %d' % (n,offset))\r\n\t\t# try and collect n values using offset\r\n\t\tfor i in range(1, n+1):\r\n\t\t\tix = i * offset\r\n\t\t\tvalues.append(history[-ix])\r\n\t# mean of last n values\r\n\tif avg_type is 'mean':\r\n\t\treturn mean(values)\r\n\t# median of last n values\r\n\treturn median(values)\r\n\r\n# define dataset\r\ndata = [10.0, 20.0, 30.0, 10.0, 20.0, 30.0, 10.0, 20.0, 30.0]\r\nprint(data)\r\n# test naive forecast\r\nfor i in [1, 2, 3]:\r\n\tprint(average_forecast(data, (i, 3, 'mean')))<\/pre>\n<p>Running the example calculates the mean values of [10], [10, 10] and [10, 10, 10].<\/p>\n<pre class=\"crayon-plain-tag\">[10.0, 20.0, 30.0, 10.0, 20.0, 30.0, 10.0, 20.0, 30.0]\r\n10.0\r\n10.0\r\n10.0<\/pre>\n<p>It is possible to combine both the naive and the average forecast strategies together into the same function.<\/p>\n<p>There is a little overlap between the methods, specifically the <em>n-<\/em>offset into the history that is used to either persist values or determine the number of values to average.<\/p>\n<p>It is helpful to have both strategies supported by one function so that we can test a suite of configurations for both strategies at once as part of a broader grid search of simple models.<\/p>\n<p>The <em>simple_forecast()<\/em> function below combines both strategies into a single function.<\/p>\n<pre class=\"crayon-plain-tag\"># one-step simple forecast\r\ndef simple_forecast(history, config):\r\n\tn, offset, avg_type = config\r\n\t# persist value, ignore other config\r\n\tif avg_type == 'persist':\r\n\t\treturn history[-n]\r\n\t# collect values to average\r\n\tvalues = list()\r\n\tif offset == 1:\r\n\t\tvalues = history[-n:]\r\n\telse:\r\n\t\t# skip bad configs\r\n\t\tif n*offset > len(history):\r\n\t\t\traise Exception('Config beyond end of data: %d %d' % (n,offset))\r\n\t\t# try and collect n values using offset\r\n\t\tfor i in range(1, n+1):\r\n\t\t\tix = i * offset\r\n\t\t\tvalues.append(history[-ix])\r\n\t# check if we can average\r\n\tif len(values) < 2:\r\n\t\traise Exception('Cannot calculate average')\r\n\t# mean of last n values\r\n\tif avg_type == 'mean':\r\n\t\treturn mean(values)\r\n\t# median of last n values\r\n\treturn median(values)<\/pre>\n<p><span style=\"font-weight: 400;\">Next, we need to build up some functions for fitting and evaluating a model repeatedly via walk-forward validation, including splitting a dataset into <\/span><span style=\"font-weight: 400;\">train<\/span><span style=\"font-weight: 400;\"> and test sets and evaluating one-step forecasts.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We can split a list or NumPy<\/span><span style=\"font-weight: 400;\"> array<\/span><span style=\"font-weight: 400;\"> of data using a slice given a specified size of the split, e.g. the number of time steps to use from the data in the test set.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The <em>train_test_split()<\/em> function below implements this for a provided dataset<\/span><span style=\"font-weight: 400;\"> and<\/span><span style=\"font-weight: 400;\"> a specified number of time steps to use in the test set.<\/span><\/p>\n<pre class=\"crayon-plain-tag\"># split a univariate dataset into train\/test sets\r\ndef train_test_split(data, n_test):\r\n\treturn data[:-n_test], data[-n_test:]<\/pre>\n<p><span style=\"font-weight: 400;\">After forecasts have been made for each step in the test dataset, they need to be compared to the test set in order to calculate an error score.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There are many popular error scores for <\/span><span style=\"font-weight: 400;\">time<\/span><span style=\"font-weight: 400;\"> series forecasting. In this case, we will use root mean squared error (RMSE), but you can change this to your preferred measure, e.g. MAPE, MAE, etc.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The <em>measure_rmse()<\/em> function below will calculate the RMSE given a list of actual (the test set) and predicted values.<\/span><\/p>\n<pre class=\"crayon-plain-tag\"># root mean squared error or rmse\r\ndef measure_rmse(actual, predicted):\r\n\treturn sqrt(mean_squared_error(actual, predicted))<\/pre>\n<p><span style=\"font-weight: 400;\">We can now implement the <a href=\"https:\/\/machinelearningmastery.com\/backtest-machine-learning-models-time-series-forecasting\/\">walk-forward validation scheme<\/a>. This is a standard approach to evaluating a time series forecasting model that respects the temporal ordering of observations.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">First, a provided univariate time series dataset is split into <\/span><span style=\"font-weight: 400;\">train<\/span><span style=\"font-weight: 400;\"> and test sets using the <em>train_test_split(<\/em><\/span><em><span style=\"font-weight: 400;\">)<\/span><\/em><span style=\"font-weight: 400;\"> function. Then the number of observations in the test set are enumerated. For each we fit a model on all of the history and make a one step forecast. The true observation for the time step is then added to the history,<\/span><span style=\"font-weight: 400;\"> and<\/span><span style=\"font-weight: 400;\"> the process is repeated. The <\/span><em><span style=\"font-weight: 400;\">simple_forecast<\/span><\/em><span style=\"font-weight: 400;\"><em>()<\/em> function is called in order to fit a model and make a prediction. Finally, an error score is calculated by comparing all one-step forecasts to the actual test set by calling the <em>measure_rmse()<\/em> function.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The <em>walk_forward_validation()<\/em> function below implements this, taking a univariate time series, a number of time steps to use in the test set, and an array of model configuration.<\/span><\/p>\n<pre class=\"crayon-plain-tag\"># walk-forward validation for univariate data\r\ndef walk_forward_validation(data, n_test, cfg):\r\n\tpredictions = list()\r\n\t# split dataset\r\n\ttrain, test = train_test_split(data, n_test)\r\n\t# seed history with training dataset\r\n\thistory = [x for x in train]\r\n\t# step over each time-step in the test set\r\n\tfor i in range(len(test)):\r\n\t\t# fit model and make forecast for history\r\n\t\tyhat = simple_forecast(history, cfg)\r\n\t\t# store forecast in list of predictions\r\n\t\tpredictions.append(yhat)\r\n\t\t# add actual observation to history for the next loop\r\n\t\thistory.append(test[i])\r\n\t# estimate prediction error\r\n\terror = measure_rmse(test, predictions)\r\n\treturn error<\/pre>\n<p><span style=\"font-weight: 400;\">If you are interested in making multi-step predictions, you can change the call to <em>predict(<\/em><\/span><em><span style=\"font-weight: 400;\">)<\/span><\/em><span style=\"font-weight: 400;\"> in the <\/span><em><span style=\"font-weight: 400;\">simple_forecast<\/span><\/em><span style=\"font-weight: 400;\"><em>()<\/em> function and also change the calculation of error in the <em>measure_rmse()<\/em> function.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We can call <em>walk_forward_validation()<\/em> repeatedly with different lists of model configurations.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">One possible issue is that some combinations of model configurations may not be called for the model and will throw an exception.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We can trap exceptions and ignore warnings during the grid search by wrapping all calls to <em>walk_forward_validation(<\/em><\/span><em><span style=\"font-weight: 400;\">)<\/span><\/em><span style=\"font-weight: 400;\"> with a try-except and a block to ignore warnings. We can also add debugging support to disable these protections in the case we want to see what is really going on. Finally, if an error does occur, we can return a <em>None<\/em> result; otherwise, we can print some information about the skill of each model evaluated. This is helpful when <\/span><span style=\"font-weight: 400;\">a large number of<\/span><span style=\"font-weight: 400;\"> models are evaluated.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The <em>score_model()<\/em> function below implements this and returns a <\/span><span style=\"font-weight: 400;\">tuple<\/span><span style=\"font-weight: 400;\"> of (key and result), where the key is a string version of the tested model configuration.<\/span><\/p>\n<pre class=\"crayon-plain-tag\"># score a model, return None on failure\r\ndef score_model(data, n_test, cfg, debug=False):\r\n\tresult = None\r\n\t# convert config to a key\r\n\tkey = str(cfg)\r\n\t# show all warnings and fail on exception if debugging\r\n\tif debug:\r\n\t\tresult = walk_forward_validation(data, n_test, cfg)\r\n\telse:\r\n\t\t# one failure during model validation suggests an unstable config\r\n\t\ttry:\r\n\t\t\t# never show warnings when grid searching, too noisy\r\n\t\t\twith catch_warnings():\r\n\t\t\t\tfilterwarnings(\"ignore\")\r\n\t\t\t\tresult = walk_forward_validation(data, n_test, cfg)\r\n\t\texcept:\r\n\t\t\terror = None\r\n\t# check for an interesting result\r\n\tif result is not None:\r\n\t\tprint(' > Model[%s] %.3f' % (key, result))\r\n\treturn (key, result)<\/pre>\n<p>Next, we need a loop to test a list of different model configurations.<\/p>\n<p>This is the main function that drives the grid search process and will call the <em>score_model()<\/em> function for each model configuration.<\/p>\n<p>We can dramatically speed up the grid search process by evaluating model configurations in parallel. One way to do that is to use the <a href=\"https:\/\/pythonhosted.org\/joblib\/\">Joblib library<\/a>.<\/p>\n<p>We can define a Parallel object with the number of cores to use and set it to the number of scores detected in your hardware.<\/p>\n<pre class=\"crayon-plain-tag\">executor = Parallel(n_jobs=cpu_count(), backend='multiprocessing')<\/pre>\n<p>We can then create a list of tasks to execute in parallel, which will be one call to the score_model() function for each model configuration we have.<\/p>\n<pre class=\"crayon-plain-tag\">tasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list)<\/pre>\n<p>Finally, we can use the <em>Parallel<\/em> object to execute the list of tasks in parallel.<\/p>\n<pre class=\"crayon-plain-tag\">scores = executor(tasks)<\/pre>\n<p>That\u2019s it.<\/p>\n<p>We can also provide a non-parallel version of evaluating all model configurations in case we want to debug something.<\/p>\n<pre class=\"crayon-plain-tag\">scores = [score_model(data, n_test, cfg) for cfg in cfg_list]<\/pre>\n<p>The result of evaluating a list of configurations will be a list of tuples, each with a name that summarizes a specific model configuration and the error of the model evaluated with that configuration as either the RMSE or <em>None<\/em> if there was an error.<\/p>\n<p>We can filter out all scores set to\u00a0<em>None<\/em>.<\/p>\n<pre class=\"crayon-plain-tag\">scores = [r for r in scores if r[1] != None]<\/pre>\n<p>We can then sort all tuples in the list by the score in ascending order (best are first), then return this list of scores for review.<\/p>\n<p>The <em>grid_search()<\/em> function below implements this behavior given a univariate time series dataset, a list of model configurations (list of lists), and the number of time steps to use in the test set. An optional parallel argument allows the evaluation of models across all cores to be tuned on or off, and is on by default.<\/p>\n<pre class=\"crayon-plain-tag\"># grid search configs\r\ndef grid_search(data, cfg_list, n_test, parallel=True):\r\n\tscores = None\r\n\tif parallel:\r\n\t\t# execute configs in parallel\r\n\t\texecutor = Parallel(n_jobs=cpu_count(), backend='multiprocessing')\r\n\t\ttasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list)\r\n\t\tscores = executor(tasks)\r\n\telse:\r\n\t\tscores = [score_model(data, n_test, cfg) for cfg in cfg_list]\r\n\t# remove empty results\r\n\tscores = [r for r in scores if r[1] != None]\r\n\t# sort configs by error, asc\r\n\tscores.sort(key=lambda tup: tup[1])\r\n\treturn scores<\/pre>\n<p>We\u2019re nearly done.<\/p>\n<p>The only thing left to do is to define a list of model configurations to try for a dataset.<\/p>\n<p>We can define this generically. The only parameter we may want to specify is the periodicity of the seasonal component in the series (offset), if one exists. By default, we will assume no seasonal component.<\/p>\n<p>The <em>simple_configs()<\/em> function below will create a list of model configurations to evaluate.<\/p>\n<p>The function only requires the maximum length of the historical data as an argument and optionally the periodicity of any seasonal component, which is defaulted to 1 (no seasonal component).<\/p>\n<pre class=\"crayon-plain-tag\"># create a set of simple configs to try\r\ndef simple_configs(max_length, offsets=[1]):\r\n\tconfigs = list()\r\n\tfor i in range(1, max_length+1):\r\n\t\tfor o in offsets:\r\n\t\t\tfor t in ['persist', 'mean', 'median']:\r\n\t\t\t\tcfg = [i, o, t]\r\n\t\t\t\tconfigs.append(cfg)\r\n\treturn configs<\/pre>\n<p>We now have a framework for grid searching simple model hyperparameters via one-step walk-forward validation.<\/p>\n<p>It is generic and will work for any in-memory univariate time series provided as a list or NumPy array.<\/p>\n<p>We can make sure all the pieces work together by testing it on a contrived 10-step dataset.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># grid search simple forecasts\r\nfrom math import sqrt\r\nfrom numpy import mean\r\nfrom numpy import median\r\nfrom multiprocessing import cpu_count\r\nfrom joblib import Parallel\r\nfrom joblib import delayed\r\nfrom warnings import catch_warnings\r\nfrom warnings import filterwarnings\r\nfrom sklearn.metrics import mean_squared_error\r\n\r\n# one-step simple forecast\r\ndef simple_forecast(history, config):\r\n\tn, offset, avg_type = config\r\n\t# persist value, ignore other config\r\n\tif avg_type == 'persist':\r\n\t\treturn history[-n]\r\n\t# collect values to average\r\n\tvalues = list()\r\n\tif offset == 1:\r\n\t\tvalues = history[-n:]\r\n\telse:\r\n\t\t# skip bad configs\r\n\t\tif n*offset > len(history):\r\n\t\t\traise Exception('Config beyond end of data: %d %d' % (n,offset))\r\n\t\t# try and collect n values using offset\r\n\t\tfor i in range(1, n+1):\r\n\t\t\tix = i * offset\r\n\t\t\tvalues.append(history[-ix])\r\n\t# check if we can average\r\n\tif len(values) < 2:\r\n\t\traise Exception('Cannot calculate average')\r\n\t# mean of last n values\r\n\tif avg_type == 'mean':\r\n\t\treturn mean(values)\r\n\t# median of last n values\r\n\treturn median(values)\r\n\r\n# root mean squared error or rmse\r\ndef measure_rmse(actual, predicted):\r\n\treturn sqrt(mean_squared_error(actual, predicted))\r\n\r\n# split a univariate dataset into train\/test sets\r\ndef train_test_split(data, n_test):\r\n\treturn data[:-n_test], data[-n_test:]\r\n\r\n# walk-forward validation for univariate data\r\ndef walk_forward_validation(data, n_test, cfg):\r\n\tpredictions = list()\r\n\t# split dataset\r\n\ttrain, test = train_test_split(data, n_test)\r\n\t# seed history with training dataset\r\n\thistory = [x for x in train]\r\n\t# step over each time-step in the test set\r\n\tfor i in range(len(test)):\r\n\t\t# fit model and make forecast for history\r\n\t\tyhat = simple_forecast(history, cfg)\r\n\t\t# store forecast in list of predictions\r\n\t\tpredictions.append(yhat)\r\n\t\t# add actual observation to history for the next loop\r\n\t\thistory.append(test[i])\r\n\t# estimate prediction error\r\n\terror = measure_rmse(test, predictions)\r\n\treturn error\r\n\r\n# score a model, return None on failure\r\ndef score_model(data, n_test, cfg, debug=False):\r\n\tresult = None\r\n\t# convert config to a key\r\n\tkey = str(cfg)\r\n\t# show all warnings and fail on exception if debugging\r\n\tif debug:\r\n\t\tresult = walk_forward_validation(data, n_test, cfg)\r\n\telse:\r\n\t\t# one failure during model validation suggests an unstable config\r\n\t\ttry:\r\n\t\t\t# never show warnings when grid searching, too noisy\r\n\t\t\twith catch_warnings():\r\n\t\t\t\tfilterwarnings(\"ignore\")\r\n\t\t\t\tresult = walk_forward_validation(data, n_test, cfg)\r\n\t\texcept:\r\n\t\t\terror = None\r\n\t# check for an interesting result\r\n\tif result is not None:\r\n\t\tprint(' > Model[%s] %.3f' % (key, result))\r\n\treturn (key, result)\r\n\r\n# grid search configs\r\ndef grid_search(data, cfg_list, n_test, parallel=True):\r\n\tscores = None\r\n\tif parallel:\r\n\t\t# execute configs in parallel\r\n\t\texecutor = Parallel(n_jobs=cpu_count(), backend='multiprocessing')\r\n\t\ttasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list)\r\n\t\tscores = executor(tasks)\r\n\telse:\r\n\t\tscores = [score_model(data, n_test, cfg) for cfg in cfg_list]\r\n\t# remove empty results\r\n\tscores = [r for r in scores if r[1] != None]\r\n\t# sort configs by error, asc\r\n\tscores.sort(key=lambda tup: tup[1])\r\n\treturn scores\r\n\r\n# create a set of simple configs to try\r\ndef simple_configs(max_length, offsets=[1]):\r\n\tconfigs = list()\r\n\tfor i in range(1, max_length+1):\r\n\t\tfor o in offsets:\r\n\t\t\tfor t in ['persist', 'mean', 'median']:\r\n\t\t\t\tcfg = [i, o, t]\r\n\t\t\t\tconfigs.append(cfg)\r\n\treturn configs\r\n\r\nif __name__ == '__main__':\r\n\t# define dataset\r\n\tdata = [10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]\r\n\tprint(data)\r\n\t# data split\r\n\tn_test = 4\r\n\t# model configs\r\n\tmax_length = len(data) - n_test\r\n\tcfg_list = simple_configs(max_length)\r\n\t# grid search\r\n\tscores = grid_search(data, cfg_list, n_test)\r\n\tprint('done')\r\n\t# list top 3 configs\r\n\tfor cfg, error in scores[:3]:\r\n\t\tprint(cfg, error)<\/pre>\n<p>Running the example first prints the contrived time series dataset.<\/p>\n<p>Next, the model configurations and their errors are reported as they are evaluated.<\/p>\n<p>Finally, the configurations and the error for the top three configurations are reported.<\/p>\n<p>We can see that the persistence model with a configuration of 1 (e.g. persist the last observation) achieves the best performance of the simple models tested, as would be expected.<\/p>\n<pre class=\"crayon-plain-tag\">[10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]\r\n\r\n> Model[[1, 1, 'persist']] 10.000\r\n> Model[[2, 1, 'persist']] 20.000\r\n> Model[[2, 1, 'mean']] 15.000\r\n> Model[[2, 1, 'median']] 15.000\r\n> Model[[3, 1, 'persist']] 30.000\r\n> Model[[4, 1, 'persist']] 40.000\r\n> Model[[5, 1, 'persist']] 50.000\r\n> Model[[5, 1, 'mean']] 30.000\r\n> Model[[3, 1, 'mean']] 20.000\r\n> Model[[4, 1, 'median']] 25.000\r\n> Model[[6, 1, 'persist']] 60.000\r\n> Model[[4, 1, 'mean']] 25.000\r\n> Model[[3, 1, 'median']] 20.000\r\n> Model[[6, 1, 'mean']] 35.000\r\n> Model[[5, 1, 'median']] 30.000\r\n> Model[[6, 1, 'median']] 35.000\r\ndone\r\n\r\n[1, 1, 'persist'] 10.0\r\n[2, 1, 'mean'] 15.0\r\n[2, 1, 'median'] 15.0<\/pre>\n<p>Now that we have a robust framework for grid searching simple model hyperparameters, let\u2019s test it out on a suite of standard univariate time series datasets.<\/p>\n<p>The results demonstrated on each dataset provide a baseline of performance that can be used to compare more sophisticated methods, such as SARIMA, ETS, and even machine learning methods.<\/p>\n<h2>Case Study 1: No Trend or Seasonality<\/h2>\n<p>The \u2018daily female births\u2019 dataset summarizes the daily total female births in California, USA in 1959.<\/p>\n<p>The dataset has no obvious trend or seasonal component.<\/p>\n<div id=\"attachment_6350\" style=\"width: 1450px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6350\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Daily-Female-Births-Dataset.png\" alt=\"Line Plot of the Daily Female Births Dataset\" width=\"1440\" height=\"780\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Daily-Female-Births-Dataset.png 1440w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Daily-Female-Births-Dataset-300x163.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Daily-Female-Births-Dataset-768x416.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Daily-Female-Births-Dataset-1024x555.png 1024w\" sizes=\"(max-width: 1440px) 100vw, 1440px\"><\/p>\n<p class=\"wp-caption-text\">Line Plot of the Daily Female Births Dataset<\/p>\n<\/div>\n<p>You can learn more about the dataset from <a href=\"https:\/\/datamarket.com\/data\/set\/235k\/daily-total-female-births-in-california-1959#!ds=235k&#038;display=line\">DataMarket<\/a>.<\/p>\n<p>Download the dataset directly from here:<\/p>\n<ul>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/daily-total-female-births.csv\">daily-total-female-births.csv<\/a><\/li>\n<\/ul>\n<p>Save the file with the filename \u2018<em>daily-total-female-births.csv<\/em>\u2018 in your current working directory.<\/p>\n<p>We can load this dataset as a Pandas series using the function <em>read_csv()<\/em>.<\/p>\n<pre class=\"crayon-plain-tag\">series = read_csv('daily-total-female-births.csv', header=0, index_col=0)<\/pre>\n<p>The dataset has one year, or 365 observations. We will use the first 200 for training and the remaining 165 as the test set.<\/p>\n<p>The complete example grid searching the daily female univariate time series forecasting problem is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># grid search simple forecast for daily female births\r\nfrom math import sqrt\r\nfrom numpy import mean\r\nfrom numpy import median\r\nfrom multiprocessing import cpu_count\r\nfrom joblib import Parallel\r\nfrom joblib import delayed\r\nfrom warnings import catch_warnings\r\nfrom warnings import filterwarnings\r\nfrom sklearn.metrics import mean_squared_error\r\nfrom pandas import read_csv\r\n\r\n# one-step simple forecast\r\ndef simple_forecast(history, config):\r\n\tn, offset, avg_type = config\r\n\t# persist value, ignore other config\r\n\tif avg_type == 'persist':\r\n\t\treturn history[-n]\r\n\t# collect values to average\r\n\tvalues = list()\r\n\tif offset == 1:\r\n\t\tvalues = history[-n:]\r\n\telse:\r\n\t\t# skip bad configs\r\n\t\tif n*offset > len(history):\r\n\t\t\traise Exception('Config beyond end of data: %d %d' % (n,offset))\r\n\t\t# try and collect n values using offset\r\n\t\tfor i in range(1, n+1):\r\n\t\t\tix = i * offset\r\n\t\t\tvalues.append(history[-ix])\r\n\t# check if we can average\r\n\tif len(values) < 2:\r\n\t\traise Exception('Cannot calculate average')\r\n\t# mean of last n values\r\n\tif avg_type == 'mean':\r\n\t\treturn mean(values)\r\n\t# median of last n values\r\n\treturn median(values)\r\n\r\n# root mean squared error or rmse\r\ndef measure_rmse(actual, predicted):\r\n\treturn sqrt(mean_squared_error(actual, predicted))\r\n\r\n# split a univariate dataset into train\/test sets\r\ndef train_test_split(data, n_test):\r\n\treturn data[:-n_test], data[-n_test:]\r\n\r\n# walk-forward validation for univariate data\r\ndef walk_forward_validation(data, n_test, cfg):\r\n\tpredictions = list()\r\n\t# split dataset\r\n\ttrain, test = train_test_split(data, n_test)\r\n\t# seed history with training dataset\r\n\thistory = [x for x in train]\r\n\t# step over each time-step in the test set\r\n\tfor i in range(len(test)):\r\n\t\t# fit model and make forecast for history\r\n\t\tyhat = simple_forecast(history, cfg)\r\n\t\t# store forecast in list of predictions\r\n\t\tpredictions.append(yhat)\r\n\t\t# add actual observation to history for the next loop\r\n\t\thistory.append(test[i])\r\n\t# estimate prediction error\r\n\terror = measure_rmse(test, predictions)\r\n\treturn error\r\n\r\n# score a model, return None on failure\r\ndef score_model(data, n_test, cfg, debug=False):\r\n\tresult = None\r\n\t# convert config to a key\r\n\tkey = str(cfg)\r\n\t# show all warnings and fail on exception if debugging\r\n\tif debug:\r\n\t\tresult = walk_forward_validation(data, n_test, cfg)\r\n\telse:\r\n\t\t# one failure during model validation suggests an unstable config\r\n\t\ttry:\r\n\t\t\t# never show warnings when grid searching, too noisy\r\n\t\t\twith catch_warnings():\r\n\t\t\t\tfilterwarnings(\"ignore\")\r\n\t\t\t\tresult = walk_forward_validation(data, n_test, cfg)\r\n\t\texcept:\r\n\t\t\terror = None\r\n\t# check for an interesting result\r\n\tif result is not None:\r\n\t\tprint(' > Model[%s] %.3f' % (key, result))\r\n\treturn (key, result)\r\n\r\n# grid search configs\r\ndef grid_search(data, cfg_list, n_test, parallel=True):\r\n\tscores = None\r\n\tif parallel:\r\n\t\t# execute configs in parallel\r\n\t\texecutor = Parallel(n_jobs=cpu_count(), backend='multiprocessing')\r\n\t\ttasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list)\r\n\t\tscores = executor(tasks)\r\n\telse:\r\n\t\tscores = [score_model(data, n_test, cfg) for cfg in cfg_list]\r\n\t# remove empty results\r\n\tscores = [r for r in scores if r[1] != None]\r\n\t# sort configs by error, asc\r\n\tscores.sort(key=lambda tup: tup[1])\r\n\treturn scores\r\n\r\n# create a set of simple configs to try\r\ndef simple_configs(max_length, offsets=[1]):\r\n\tconfigs = list()\r\n\tfor i in range(1, max_length+1):\r\n\t\tfor o in offsets:\r\n\t\t\tfor t in ['persist', 'mean', 'median']:\r\n\t\t\t\tcfg = [i, o, t]\r\n\t\t\t\tconfigs.append(cfg)\r\n\treturn configs\r\n\r\nif __name__ == '__main__':\r\n\t# define dataset\r\n\tseries = read_csv('daily-total-female-births.csv', header=0, index_col=0)\r\n\tdata = series.values\r\n\tprint(data)\r\n\t# data split\r\n\tn_test = 165\r\n\t# model configs\r\n\tmax_length = len(data) - n_test\r\n\tcfg_list = simple_configs(max_length)\r\n\t# grid search\r\n\tscores = grid_search(data, cfg_list, n_test)\r\n\tprint('done')\r\n\t# list top 3 configs\r\n\tfor cfg, error in scores[:3]:\r\n\t\tprint(cfg, error)<\/pre>\n<p>Running the example prints the model configurations and the RMSE are printed as the models are evaluated.<\/p>\n<p>The top three model configurations and their error are reported at the end of the run.<\/p>\n<p>We can see that the best result was an RMSE of about 6.93 births with the following configuration:<\/p>\n<ul>\n<li><strong>Strategy<\/strong>: Average<\/li>\n<li><strong>n<\/strong>: 22<\/li>\n<li><strong>function<\/strong>: mean()<\/li>\n<\/ul>\n<p>This is surprising given the lack of trend or seasonality, I would have expected either a persistence of -1 or an average of the entire historical dataset to result in the best performance.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n> Model[[186, 1, 'mean']] 7.523\r\n> Model[[200, 1, 'median']] 7.681\r\n> Model[[186, 1, 'median']] 7.691\r\n> Model[[187, 1, 'persist']] 11.137\r\n> Model[[187, 1, 'mean']] 7.527\r\ndone\r\n\r\n[22, 1, 'mean'] 6.930411499775709\r\n[23, 1, 'mean'] 6.932293117115201\r\n[21, 1, 'mean'] 6.951918385845375<\/pre>\n<\/p>\n<h2>Case Study 2: Trend<\/h2>\n<p>The \u2018shampoo\u2019 dataset summarizes the monthly sales of shampoo over a three-year period.<\/p>\n<p>The dataset contains an obvious trend but no obvious seasonal component.<\/p>\n<div id=\"attachment_6351\" style=\"width: 1448px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6351\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Monthly-Shampoo-Sales-Dataset.png\" alt=\"Line Plot of the Monthly Shampoo Sales Dataset\" width=\"1438\" height=\"776\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Monthly-Shampoo-Sales-Dataset.png 1438w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Monthly-Shampoo-Sales-Dataset-300x162.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Monthly-Shampoo-Sales-Dataset-768x414.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Monthly-Shampoo-Sales-Dataset-1024x553.png 1024w\" sizes=\"(max-width: 1438px) 100vw, 1438px\"><\/p>\n<p class=\"wp-caption-text\">Line Plot of the Monthly Shampoo Sales Dataset<\/p>\n<\/div>\n<p>You can learn more about the dataset from <a href=\"https:\/\/datamarket.com\/data\/set\/22r0\/sales-of-shampoo-over-a-three-year-period#!ds=22r0&#038;display=line\">DataMarket<\/a>.<\/p>\n<p>Download the dataset directly from here:<\/p>\n<ul>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/shampoo.csv\">shampoo.csv<\/a><\/li>\n<\/ul>\n<p>Save the file with the filename \u2018<em>shampoo.csv<\/em>\u2018 in your current working directory.<\/p>\n<p>We can load this dataset as a Pandas series using the function <em>read_csv()<\/em>.<\/p>\n<pre class=\"crayon-plain-tag\"># parse dates\r\ndef custom_parser(x):\r\n\treturn datetime.strptime('195'+x, '%Y-%m')\r\n\r\n# load dataset\r\nseries = read_csv('shampoo.csv', header=0, index_col=0, date_parser=custom_parser)<\/pre>\n<p>The dataset has three years, or 36 observations. We will use the first 24 for training and the remaining 12 as the test set.<\/p>\n<p>The complete example grid searching the shampoo sales univariate time series forecasting problem is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># grid search simple forecast for monthly shampoo sales\r\nfrom math import sqrt\r\nfrom numpy import mean\r\nfrom numpy import median\r\nfrom multiprocessing import cpu_count\r\nfrom joblib import Parallel\r\nfrom joblib import delayed\r\nfrom warnings import catch_warnings\r\nfrom warnings import filterwarnings\r\nfrom sklearn.metrics import mean_squared_error\r\nfrom pandas import read_csv\r\nfrom pandas import datetime\r\n\r\n# one-step simple forecast\r\ndef simple_forecast(history, config):\r\n\tn, offset, avg_type = config\r\n\t# persist value, ignore other config\r\n\tif avg_type == 'persist':\r\n\t\treturn history[-n]\r\n\t# collect values to average\r\n\tvalues = list()\r\n\tif offset == 1:\r\n\t\tvalues = history[-n:]\r\n\telse:\r\n\t\t# skip bad configs\r\n\t\tif n*offset > len(history):\r\n\t\t\traise Exception('Config beyond end of data: %d %d' % (n,offset))\r\n\t\t# try and collect n values using offset\r\n\t\tfor i in range(1, n+1):\r\n\t\t\tix = i * offset\r\n\t\t\tvalues.append(history[-ix])\r\n\t# check if we can average\r\n\tif len(values) < 2:\r\n\t\traise Exception('Cannot calculate average')\r\n\t# mean of last n values\r\n\tif avg_type == 'mean':\r\n\t\treturn mean(values)\r\n\t# median of last n values\r\n\treturn median(values)\r\n\r\n# root mean squared error or rmse\r\ndef measure_rmse(actual, predicted):\r\n\treturn sqrt(mean_squared_error(actual, predicted))\r\n\r\n# split a univariate dataset into train\/test sets\r\ndef train_test_split(data, n_test):\r\n\treturn data[:-n_test], data[-n_test:]\r\n\r\n# walk-forward validation for univariate data\r\ndef walk_forward_validation(data, n_test, cfg):\r\n\tpredictions = list()\r\n\t# split dataset\r\n\ttrain, test = train_test_split(data, n_test)\r\n\t# seed history with training dataset\r\n\thistory = [x for x in train]\r\n\t# step over each time-step in the test set\r\n\tfor i in range(len(test)):\r\n\t\t# fit model and make forecast for history\r\n\t\tyhat = simple_forecast(history, cfg)\r\n\t\t# store forecast in list of predictions\r\n\t\tpredictions.append(yhat)\r\n\t\t# add actual observation to history for the next loop\r\n\t\thistory.append(test[i])\r\n\t# estimate prediction error\r\n\terror = measure_rmse(test, predictions)\r\n\treturn error\r\n\r\n# score a model, return None on failure\r\ndef score_model(data, n_test, cfg, debug=False):\r\n\tresult = None\r\n\t# convert config to a key\r\n\tkey = str(cfg)\r\n\t# show all warnings and fail on exception if debugging\r\n\tif debug:\r\n\t\tresult = walk_forward_validation(data, n_test, cfg)\r\n\telse:\r\n\t\t# one failure during model validation suggests an unstable config\r\n\t\ttry:\r\n\t\t\t# never show warnings when grid searching, too noisy\r\n\t\t\twith catch_warnings():\r\n\t\t\t\tfilterwarnings(\"ignore\")\r\n\t\t\t\tresult = walk_forward_validation(data, n_test, cfg)\r\n\t\texcept:\r\n\t\t\terror = None\r\n\t# check for an interesting result\r\n\tif result is not None:\r\n\t\tprint(' > Model[%s] %.3f' % (key, result))\r\n\treturn (key, result)\r\n\r\n# grid search configs\r\ndef grid_search(data, cfg_list, n_test, parallel=True):\r\n\tscores = None\r\n\tif parallel:\r\n\t\t# execute configs in parallel\r\n\t\texecutor = Parallel(n_jobs=cpu_count(), backend='multiprocessing')\r\n\t\ttasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list)\r\n\t\tscores = executor(tasks)\r\n\telse:\r\n\t\tscores = [score_model(data, n_test, cfg) for cfg in cfg_list]\r\n\t# remove empty results\r\n\tscores = [r for r in scores if r[1] != None]\r\n\t# sort configs by error, asc\r\n\tscores.sort(key=lambda tup: tup[1])\r\n\treturn scores\r\n\r\n# create a set of simple configs to try\r\ndef simple_configs(max_length, offsets=[1]):\r\n\tconfigs = list()\r\n\tfor i in range(1, max_length+1):\r\n\t\tfor o in offsets:\r\n\t\t\tfor t in ['persist', 'mean', 'median']:\r\n\t\t\t\tcfg = [i, o, t]\r\n\t\t\t\tconfigs.append(cfg)\r\n\treturn configs\r\n\r\n# parse dates\r\ndef custom_parser(x):\r\n\treturn datetime.strptime('195'+x, '%Y-%m')\r\n\r\nif __name__ == '__main__':\r\n\t# load dataset\r\n\tseries = read_csv('shampoo.csv', header=0, index_col=0, date_parser=custom_parser)\r\n\tdata = series.values\r\n\tprint(data.shape)\r\n\t# data split\r\n\tn_test = 12\r\n\t# model configs\r\n\tmax_length = len(data) - n_test\r\n\tcfg_list = simple_configs(max_length)\r\n\t# grid search\r\n\tscores = grid_search(data, cfg_list, n_test)\r\n\tprint('done')\r\n\t# list top 3 configs\r\n\tfor cfg, error in scores[:3]:\r\n\t\tprint(cfg, error)<\/pre>\n<p>Running the example prints the configurations and the RMSE are printed as the models are evaluated.<\/p>\n<p>The top three model configurations and their error are reported at the end of the run.<\/p>\n<p>We can see that the best result was an RMSE of about 95.69 sales with the following configuration:<\/p>\n<ul>\n<li><strong>Strategy<\/strong>: Persist<\/li>\n<li><strong>n<\/strong>: 2<\/li>\n<\/ul>\n<p>This is surprising as the trend structure of the data would suggest that persisting the previous value (-1) would be the best approach, not persisting the second last value.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n> Model[[23, 1, 'mean']] 209.782\r\n> Model[[23, 1, 'median']] 221.863\r\n> Model[[24, 1, 'persist']] 305.635\r\n> Model[[24, 1, 'mean']] 213.466\r\n> Model[[24, 1, 'median']] 226.061\r\ndone\r\n\r\n[2, 1, 'persist'] 95.69454007413378\r\n[2, 1, 'mean'] 96.01140340258198\r\n[2, 1, 'median'] 96.01140340258198<\/pre>\n<\/p>\n<h2>Case Study 3: Seasonality<\/h2>\n<p>The \u2018monthly mean temperatures\u2019 dataset summarizes the monthly average air temperatures in Nottingham Castle, England from 1920 to 1939 in degrees Fahrenheit.<\/p>\n<p>The dataset has an obvious seasonal component and no obvious trend.<\/p>\n<div id=\"attachment_6352\" style=\"width: 1464px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6352\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Monthly-Mean-Temperatures-Dataset.png\" alt=\"Line Plot of the Monthly Mean Temperatures Dataset\" width=\"1454\" height=\"766\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Monthly-Mean-Temperatures-Dataset.png 1454w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Monthly-Mean-Temperatures-Dataset-300x158.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Monthly-Mean-Temperatures-Dataset-768x405.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Monthly-Mean-Temperatures-Dataset-1024x539.png 1024w\" sizes=\"(max-width: 1454px) 100vw, 1454px\"><\/p>\n<p class=\"wp-caption-text\">Line Plot of the Monthly Mean Temperatures Dataset<\/p>\n<\/div>\n<p>You can learn more about the dataset from <a href=\"https:\/\/datamarket.com\/data\/set\/22li\/mean-monthly-air-temperature-deg-f-nottingham-castle-1920-1939#!ds=22li&#038;display=line\">DataMarket<\/a>.<\/p>\n<p>Download the dataset directly from here:<\/p>\n<ul>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/monthly-mean-temp.csv\">monthly-mean-temp.csv<\/a><\/li>\n<\/ul>\n<p>Save the file with the filename \u2018<em>monthly-mean-temp.csv<\/em>\u2018 in your current working directory.<\/p>\n<p>We can load this dataset as a Pandas series using the function <em>read_csv()<\/em>.<\/p>\n<pre class=\"crayon-plain-tag\">series = read_csv('monthly-mean-temp.csv', header=0, index_col=0)<\/pre>\n<p>The dataset has 20 years, or 240 observations. We will trim the dataset to the last five years of data (60 observations) in order to speed up the model evaluation process and use the last year or 12 observations for the test set.<\/p>\n<pre class=\"crayon-plain-tag\"># trim dataset to 5 years\r\ndata = data[-(5*12):]<\/pre>\n<p>The period of the seasonal component is about one year, or 12 observations. We will use this as the seasonal period in the call to the <em>simple_configs()<\/em> function when preparing the model configurations.<\/p>\n<pre class=\"crayon-plain-tag\"># model configs\r\ncfg_list = simple_configs(seasonal=[0, 12])<\/pre>\n<p>The complete example grid searching the monthly mean temperature time series forecasting problem is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># grid search simple forecast for monthly mean temperature\r\nfrom math import sqrt\r\nfrom numpy import mean\r\nfrom numpy import median\r\nfrom multiprocessing import cpu_count\r\nfrom joblib import Parallel\r\nfrom joblib import delayed\r\nfrom warnings import catch_warnings\r\nfrom warnings import filterwarnings\r\nfrom sklearn.metrics import mean_squared_error\r\nfrom pandas import read_csv\r\n\r\n# one-step simple forecast\r\ndef simple_forecast(history, config):\r\n\tn, offset, avg_type = config\r\n\t# persist value, ignore other config\r\n\tif avg_type == 'persist':\r\n\t\treturn history[-n]\r\n\t# collect values to average\r\n\tvalues = list()\r\n\tif offset == 1:\r\n\t\tvalues = history[-n:]\r\n\telse:\r\n\t\t# skip bad configs\r\n\t\tif n*offset > len(history):\r\n\t\t\traise Exception('Config beyond end of data: %d %d' % (n,offset))\r\n\t\t# try and collect n values using offset\r\n\t\tfor i in range(1, n+1):\r\n\t\t\tix = i * offset\r\n\t\t\tvalues.append(history[-ix])\r\n\t# check if we can average\r\n\tif len(values) < 2:\r\n\t\traise Exception('Cannot calculate average')\r\n\t# mean of last n values\r\n\tif avg_type == 'mean':\r\n\t\treturn mean(values)\r\n\t# median of last n values\r\n\treturn median(values)\r\n\r\n# root mean squared error or rmse\r\ndef measure_rmse(actual, predicted):\r\n\treturn sqrt(mean_squared_error(actual, predicted))\r\n\r\n# split a univariate dataset into train\/test sets\r\ndef train_test_split(data, n_test):\r\n\treturn data[:-n_test], data[-n_test:]\r\n\r\n# walk-forward validation for univariate data\r\ndef walk_forward_validation(data, n_test, cfg):\r\n\tpredictions = list()\r\n\t# split dataset\r\n\ttrain, test = train_test_split(data, n_test)\r\n\t# seed history with training dataset\r\n\thistory = [x for x in train]\r\n\t# step over each time-step in the test set\r\n\tfor i in range(len(test)):\r\n\t\t# fit model and make forecast for history\r\n\t\tyhat = simple_forecast(history, cfg)\r\n\t\t# store forecast in list of predictions\r\n\t\tpredictions.append(yhat)\r\n\t\t# add actual observation to history for the next loop\r\n\t\thistory.append(test[i])\r\n\t# estimate prediction error\r\n\terror = measure_rmse(test, predictions)\r\n\treturn error\r\n\r\n# score a model, return None on failure\r\ndef score_model(data, n_test, cfg, debug=False):\r\n\tresult = None\r\n\t# convert config to a key\r\n\tkey = str(cfg)\r\n\t# show all warnings and fail on exception if debugging\r\n\tif debug:\r\n\t\tresult = walk_forward_validation(data, n_test, cfg)\r\n\telse:\r\n\t\t# one failure during model validation suggests an unstable config\r\n\t\ttry:\r\n\t\t\t# never show warnings when grid searching, too noisy\r\n\t\t\twith catch_warnings():\r\n\t\t\t\tfilterwarnings(\"ignore\")\r\n\t\t\t\tresult = walk_forward_validation(data, n_test, cfg)\r\n\t\texcept:\r\n\t\t\terror = None\r\n\t# check for an interesting result\r\n\tif result is not None:\r\n\t\tprint(' > Model[%s] %.3f' % (key, result))\r\n\treturn (key, result)\r\n\r\n# grid search configs\r\ndef grid_search(data, cfg_list, n_test, parallel=True):\r\n\tscores = None\r\n\tif parallel:\r\n\t\t# execute configs in parallel\r\n\t\texecutor = Parallel(n_jobs=cpu_count(), backend='multiprocessing')\r\n\t\ttasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list)\r\n\t\tscores = executor(tasks)\r\n\telse:\r\n\t\tscores = [score_model(data, n_test, cfg) for cfg in cfg_list]\r\n\t# remove empty results\r\n\tscores = [r for r in scores if r[1] != None]\r\n\t# sort configs by error, asc\r\n\tscores.sort(key=lambda tup: tup[1])\r\n\treturn scores\r\n\r\n# create a set of simple configs to try\r\ndef simple_configs(max_length, offsets=[1]):\r\n\tconfigs = list()\r\n\tfor i in range(1, max_length+1):\r\n\t\tfor o in offsets:\r\n\t\t\tfor t in ['persist', 'mean', 'median']:\r\n\t\t\t\tcfg = [i, o, t]\r\n\t\t\t\tconfigs.append(cfg)\r\n\treturn configs\r\n\r\nif __name__ == '__main__':\r\n\t# define dataset\r\n\tseries = read_csv('monthly-mean-temp.csv', header=0, index_col=0)\r\n\tdata = series.values\r\n\tprint(data)\r\n\t# data split\r\n\tn_test = 12\r\n\t# model configs\r\n\tmax_length = len(data) - n_test\r\n\tcfg_list = simple_configs(max_length, offsets=[1,12])\r\n\t# grid search\r\n\tscores = grid_search(data, cfg_list, n_test)\r\n\tprint('done')\r\n\t# list top 3 configs\r\n\tfor cfg, error in scores[:3]:\r\n\t\tprint(cfg, error)<\/pre>\n<p>Running the example prints the model configurations and the RMSE are printed as the models are evaluated.<\/p>\n<p>The top three model configurations and their error are reported at the end of the run.<\/p>\n<p>We can see that the best result was an RMSE of about 1.501 degrees with the following configuration:<\/p>\n<ul>\n<li><strong>Strategy<\/strong>: Average<\/li>\n<li><strong>n<\/strong>: 4<\/li>\n<li><strong>offset<\/strong>: 12<\/li>\n<li><strong>function<\/strong>: mean()<\/li>\n<\/ul>\n<p>This finding is not too surprising. Given the seasonal structure of the data, we would expect a function of the last few observations at prior points in the yearly cycle to be effective.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n> Model[[227, 12, 'persist']] 5.365\r\n> Model[[228, 1, 'persist']] 2.818\r\n> Model[[228, 1, 'mean']] 8.258\r\n> Model[[228, 1, 'median']] 8.361\r\n> Model[[228, 12, 'persist']] 2.818\r\ndone\r\n[4, 12, 'mean'] 1.5015616870445234\r\n[8, 12, 'mean'] 1.5794579766489512\r\n[13, 12, 'mean'] 1.586186052546763<\/pre>\n<\/p>\n<h2>Case Study 4: Trend and Seasonality<\/h2>\n<p>The \u2018monthly car sales\u2019 dataset summarizes the monthly car sales in Quebec, Canada between 1960 and 1968.<\/p>\n<p>The dataset has an obvious trend and seasonal component.<\/p>\n<div id=\"attachment_6353\" style=\"width: 1472px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6353\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Monthly-Car-Sales-Dataset.png\" alt=\"Line Plot of the Monthly Car Sales Dataset\" width=\"1462\" height=\"768\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Monthly-Car-Sales-Dataset.png 1462w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Monthly-Car-Sales-Dataset-300x158.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Monthly-Car-Sales-Dataset-768x403.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Line-Plot-of-the-Monthly-Car-Sales-Dataset-1024x538.png 1024w\" sizes=\"(max-width: 1462px) 100vw, 1462px\"><\/p>\n<p class=\"wp-caption-text\">Line Plot of the Monthly Car Sales Dataset<\/p>\n<\/div>\n<p>You can learn more about the dataset from <a href=\"https:\/\/datamarket.com\/data\/set\/22n4\/monthly-car-sales-in-quebec-1960-1968#!ds=22n4&#038;display=line\">DataMarket<\/a>.<\/p>\n<p>Download the dataset directly from here:<\/p>\n<ul>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/monthly-car-sales.csv\">monthly-car-sales.csv<\/a><\/li>\n<\/ul>\n<p>Save the file with the filename \u2018<em>monthly-car-sales.csv<\/em>\u2018 in your current working directory.<\/p>\n<p>We can load this dataset as a Pandas series using the function <em>read_csv()<\/em>.<\/p>\n<pre class=\"crayon-plain-tag\">series = read_csv('monthly-car-sales.csv', header=0, index_col=0)<\/pre>\n<p>The dataset has 9 years, or 108 observations. We will use the last year or 12 observations as the test set.<\/p>\n<p>The period of the seasonal component could be six months or 12 months. We will try both as the seasonal period in the call to the <em>simple_configs()<\/em> function when preparing the model configurations.<\/p>\n<pre class=\"crayon-plain-tag\"># model configs\r\ncfg_list = simple_configs(seasonal=[0,6,12])<\/pre>\n<p>The complete example grid searching the monthly car sales time series forecasting problem is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># grid search simple forecast for monthly car sales\r\nfrom math import sqrt\r\nfrom numpy import mean\r\nfrom numpy import median\r\nfrom multiprocessing import cpu_count\r\nfrom joblib import Parallel\r\nfrom joblib import delayed\r\nfrom warnings import catch_warnings\r\nfrom warnings import filterwarnings\r\nfrom sklearn.metrics import mean_squared_error\r\nfrom pandas import read_csv\r\n\r\n# one-step simple forecast\r\ndef simple_forecast(history, config):\r\n\tn, offset, avg_type = config\r\n\t# persist value, ignore other config\r\n\tif avg_type == 'persist':\r\n\t\treturn history[-n]\r\n\t# collect values to average\r\n\tvalues = list()\r\n\tif offset == 1:\r\n\t\tvalues = history[-n:]\r\n\telse:\r\n\t\t# skip bad configs\r\n\t\tif n*offset > len(history):\r\n\t\t\traise Exception('Config beyond end of data: %d %d' % (n,offset))\r\n\t\t# try and collect n values using offset\r\n\t\tfor i in range(1, n+1):\r\n\t\t\tix = i * offset\r\n\t\t\tvalues.append(history[-ix])\r\n\t# check if we can average\r\n\tif len(values) < 2:\r\n\t\traise Exception('Cannot calculate average')\r\n\t# mean of last n values\r\n\tif avg_type == 'mean':\r\n\t\treturn mean(values)\r\n\t# median of last n values\r\n\treturn median(values)\r\n\r\n# root mean squared error or rmse\r\ndef measure_rmse(actual, predicted):\r\n\treturn sqrt(mean_squared_error(actual, predicted))\r\n\r\n# split a univariate dataset into train\/test sets\r\ndef train_test_split(data, n_test):\r\n\treturn data[:-n_test], data[-n_test:]\r\n\r\n# walk-forward validation for univariate data\r\ndef walk_forward_validation(data, n_test, cfg):\r\n\tpredictions = list()\r\n\t# split dataset\r\n\ttrain, test = train_test_split(data, n_test)\r\n\t# seed history with training dataset\r\n\thistory = [x for x in train]\r\n\t# step over each time-step in the test set\r\n\tfor i in range(len(test)):\r\n\t\t# fit model and make forecast for history\r\n\t\tyhat = simple_forecast(history, cfg)\r\n\t\t# store forecast in list of predictions\r\n\t\tpredictions.append(yhat)\r\n\t\t# add actual observation to history for the next loop\r\n\t\thistory.append(test[i])\r\n\t# estimate prediction error\r\n\terror = measure_rmse(test, predictions)\r\n\treturn error\r\n\r\n# score a model, return None on failure\r\ndef score_model(data, n_test, cfg, debug=False):\r\n\tresult = None\r\n\t# convert config to a key\r\n\tkey = str(cfg)\r\n\t# show all warnings and fail on exception if debugging\r\n\tif debug:\r\n\t\tresult = walk_forward_validation(data, n_test, cfg)\r\n\telse:\r\n\t\t# one failure during model validation suggests an unstable config\r\n\t\ttry:\r\n\t\t\t# never show warnings when grid searching, too noisy\r\n\t\t\twith catch_warnings():\r\n\t\t\t\tfilterwarnings(\"ignore\")\r\n\t\t\t\tresult = walk_forward_validation(data, n_test, cfg)\r\n\t\texcept:\r\n\t\t\terror = None\r\n\t# check for an interesting result\r\n\tif result is not None:\r\n\t\tprint(' > Model[%s] %.3f' % (key, result))\r\n\treturn (key, result)\r\n\r\n# grid search configs\r\ndef grid_search(data, cfg_list, n_test, parallel=True):\r\n\tscores = None\r\n\tif parallel:\r\n\t\t# execute configs in parallel\r\n\t\texecutor = Parallel(n_jobs=cpu_count(), backend='multiprocessing')\r\n\t\ttasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list)\r\n\t\tscores = executor(tasks)\r\n\telse:\r\n\t\tscores = [score_model(data, n_test, cfg) for cfg in cfg_list]\r\n\t# remove empty results\r\n\tscores = [r for r in scores if r[1] != None]\r\n\t# sort configs by error, asc\r\n\tscores.sort(key=lambda tup: tup[1])\r\n\treturn scores\r\n\r\n# create a set of simple configs to try\r\ndef simple_configs(max_length, offsets=[1]):\r\n\tconfigs = list()\r\n\tfor i in range(1, max_length+1):\r\n\t\tfor o in offsets:\r\n\t\t\tfor t in ['persist', 'mean', 'median']:\r\n\t\t\t\tcfg = [i, o, t]\r\n\t\t\t\tconfigs.append(cfg)\r\n\treturn configs\r\n\r\nif __name__ == '__main__':\r\n\t# define dataset\r\n\tseries = read_csv('monthly-car-sales.csv', header=0, index_col=0)\r\n\tdata = series.values\r\n\tprint(data)\r\n\t# data split\r\n\tn_test = 12\r\n\t# model configs\r\n\tmax_length = len(data) - n_test\r\n\tcfg_list = simple_configs(max_length, offsets=[1,12])\r\n\t# grid search\r\n\tscores = grid_search(data, cfg_list, n_test)\r\n\tprint('done')\r\n\t# list top 3 configs\r\n\tfor cfg, error in scores[:3]:\r\n\t\tprint(cfg, error)<\/pre>\n<p>Running the example prints the model configurations and the RMSE are printed as the models are evaluated.<\/p>\n<p>The top three model configurations and their error are reported at the end of the run.<\/p>\n<p>We can see that the best result was an RMSE of about 1841.155 sales with the following configuration:<\/p>\n<ul>\n<li><strong>Strategy<\/strong>: Average<\/li>\n<li><strong>n<\/strong>: 3<\/li>\n<li><strong>offset<\/strong>: 12<\/li>\n<li><strong>function<\/strong>: median()<\/li>\n<\/ul>\n<p>It is not surprising that the chosen model is a function of the last few observations at the same point in prior cycles, although the use of the median instead of the mean may not have been immediately obvious and the results were much better than the mean.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n> Model[[79, 1, 'median']] 5124.113\r\n> Model[[91, 12, 'persist']] 9580.149\r\n> Model[[79, 12, 'persist']] 8641.529\r\n> Model[[92, 1, 'persist']] 9830.921\r\n> Model[[92, 1, 'mean']] 5148.126\r\ndone\r\n[3, 12, 'median'] 1841.1559321976688\r\n[3, 12, 'mean'] 2115.198495632485\r\n[4, 12, 'median'] 2184.37708988932<\/pre>\n<\/p>\n<h2>Extensions<\/h2>\n<p>This section lists some ideas for extending the tutorial that you may wish to explore.<\/p>\n<ul>\n<li><strong>Plot Forecast<\/strong>. Update the framework to re-fit a model with the best configuration and forecast the entire test dataset, then plot the forecast compared to the actual observations in the test set.<\/li>\n<li><strong>Drift Method<\/strong>. Implement the drift method for simple forecasts and compare the results to the average and naive methods.<\/li>\n<li><strong>Another Dataset<\/strong>. Apply the developed framework to an additional univariate time series problem (e.g. from the <a href=\"https:\/\/datamarket.com\/data\/list\/?q=provider:tsdl\">Time Series Dataset Library<\/a>).<\/li>\n<\/ul>\n<p>If you explore any of these extensions, I\u2019d love to know.<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Forecasting\">Forecasting, Wikipedia<\/a><\/li>\n<li><a href=\"https:\/\/pythonhosted.org\/joblib\/\">Joblib: running Python functions as pipeline jobs<\/a><\/li>\n<li><a href=\"https:\/\/datamarket.com\/data\/list\/?q=provider:tsdl\">Time Series Dataset Library<\/a>, DataMarket.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to develop a framework from scratch for grid searching simple naive and averaging strategies for time series forecasting with univariate data.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>How to develop a framework for grid searching simple models from scratch using walk-forward validation.<\/li>\n<li>How to grid search simple model hyperparameters for daily time series data for births.<\/li>\n<li>How to grid search simple model hyperparameters for monthly time series data for shampoo sales, car sales, and temperature.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/how-to-grid-search-naive-methods-for-univariate-time-series-forecasting\/\">How to Grid Search Naive Methods for Univariate Time Series Forecasting<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/how-to-grid-search-naive-methods-for-univariate-time-series-forecasting\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Simple forecasting methods include naively using the last observation as the prediction or an average of prior observations. It is important to [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2018\/10\/25\/how-to-grid-search-naive-methods-for-univariate-time-series-forecasting\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":1213,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1212"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=1212"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1212\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/1213"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=1212"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=1212"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=1212"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}