{"id":961,"date":"2018-08-21T19:00:24","date_gmt":"2018-08-21T19:00:24","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2018\/08\/21\/4-common-machine-learning-data-transforms-for-time-series-forecasting\/"},"modified":"2018-08-21T19:00:24","modified_gmt":"2018-08-21T19:00:24","slug":"4-common-machine-learning-data-transforms-for-time-series-forecasting","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2018\/08\/21\/4-common-machine-learning-data-transforms-for-time-series-forecasting\/","title":{"rendered":"4 Common Machine Learning Data Transforms for Time Series Forecasting"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Time series data often requires some preparation prior to being modeled with machine learning algorithms.<\/p>\n<p>For example, differencing operations can be used to remove trend and seasonal structure from the sequence in order to simplify the prediction problem. Some algorithms, such as neural networks, prefer data to be standardized and\/or normalized prior to modeling.<\/p>\n<p>Any transform operations applied to the series also require a similar inverse transform to be applied on the predictions. This is required so that the resulting calculated performance measures are in the same scale as the output variable and can be compared to classical forecasting methods.<\/p>\n<p>In this post, you will discover how to perform and invert four common data transforms for time series data in machine learning.<\/p>\n<p>After reading this post, you will know:<\/p>\n<ul>\n<li>How to transform and inverse the transform for four methods in Python.<\/li>\n<li>Important considerations when using transforms on training and test datasets.<\/li>\n<li>The suggested order for transforms when multiple operations are required on a dataset.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_6010\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6010\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/08\/4-Common-Machine-Learning-Data-Transforms-for-Time-Series-Forecasting.jpg\" alt=\"4 Common Machine Learning Data Transforms for Time Series Forecasting\" width=\"640\" height=\"379\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/08\/4-Common-Machine-Learning-Data-Transforms-for-Time-Series-Forecasting.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/08\/4-Common-Machine-Learning-Data-Transforms-for-Time-Series-Forecasting-300x178.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p class=\"wp-caption-text\">4 Common Machine Learning Data Transforms for Time Series Forecasting<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/wolfgangstaudt\/2200561848\/\">Wolfgang Staudt<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Transforms for Time Series Data<\/li>\n<li>Considerations for Model Evaluation<\/li>\n<li>Order of Data Transforms<\/li>\n<\/ol>\n<h2>Transforms for Time Series Data<\/h2>\n<p>Given a univariate time series dataset, there are four transforms that are popular when using machine learning methods to model and make predictions.<\/p>\n<p>They are:<\/p>\n<ul>\n<li>Power Transform<\/li>\n<li>Difference Transform<\/li>\n<li>Standardization<\/li>\n<li>Normalization<\/li>\n<\/ul>\n<p>Let\u2019s take a quick look at each in turn and how to perform these transforms in Python.<\/p>\n<p>We will also review how to reverse the transform operation as this is required when we want to evaluate the predictions in their original scale so that performance measures can be compared directly.<\/p>\n<p>Are there other transforms you like to use on your time series data for modeling with machine learning methods?<br \/>\nLet me know in the comments below.<\/p>\n<h3>Power Transform<\/h3>\n<p>A <a href=\"https:\/\/en.wikipedia.org\/wiki\/Power_transform\">power transform<\/a> removes a shift from a data distribution to make the distribution more-normal (Gaussian).<\/p>\n<p>On a time series dataset, this can have the effect of removing a change in variance over time.<\/p>\n<p>Popular examples are the log transform (positive values) or generalized versions such as the Box-Cox transform (positive values) or the Yeo-Johnson transform (positive and negative values).<\/p>\n<p>For example, we can implement the Box-Cox transform in Python using the <a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.stats.boxcox.html\">boxcox() function<\/a> from the SciPy library.<\/p>\n<p>By default, the method will numerically optimize the lambda value for the transform and return the optimal value.<\/p>\n<pre class=\"crayon-plain-tag\">from scipy.stats import boxcox\r\n# define data\r\ndata = ...\r\n# box-cox transform\r\nresult, lmbda = boxcox(data)<\/pre>\n<p>The transform can be inverted but requires a custom function listed below named <em>invert_boxcox()<\/em> that takes a transformed value and the lambda value that was used to perform the transform.<\/p>\n<pre class=\"crayon-plain-tag\">from math import log\r\nfrom math import exp\r\n# invert a boxcox transform for one value\r\ndef invert_boxcox(value, lam):\r\n\t# log case\r\n\tif lam == 0:\r\n\t\treturn exp(value)\r\n\t# all other cases\r\n\treturn exp(log(lam * value + 1) \/ lam)<\/pre>\n<p>A complete example of applying the power transform to a dataset and reversing the transform is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># example of power transform and inversion\r\nfrom math import log\r\nfrom math import exp\r\nfrom scipy.stats import boxcox\r\n\r\n# invert a boxcox transform for one value\r\ndef invert_boxcox(value, lam):\r\n\t# log case\r\n\tif lam == 0:\r\n\t\treturn exp(value)\r\n\t# all other cases\r\n\treturn exp(log(lam * value + 1) \/ lam)\r\n\r\n\r\n# define dataset\r\ndata = [x for x in range(1, 10)]\r\nprint(data)\r\n# power transform\r\ntransformed, lmbda = boxcox(data)\r\nprint(transformed, lmbda)\r\n# invert transform\r\ninverted = [invert_boxcox(x, lmbda) for x in transformed]\r\nprint(inverted)<\/pre>\n<p>Running the example prints the original dataset, the results of the power transform, and the original values (or close to it) after the transform is inverted.<\/p>\n<pre class=\"crayon-plain-tag\">[1, 2, 3, 4, 5, 6, 7, 8, 9]\r\n[0.         0.89887536 1.67448353 2.37952145 3.03633818 3.65711928\r\n 4.2494518  4.81847233 5.36786648] 0.7200338588580095\r\n[1.0, 2.0, 2.9999999999999996, 3.999999999999999, 5.000000000000001, 6.000000000000001, 6.999999999999999, 7.999999999999998, 8.999999999999998]<\/pre>\n<\/p>\n<h3>Difference Transform<\/h3>\n<p>A difference transform is a simple way for removing a systematic structure from the time series.<\/p>\n<p>For example, a trend can be removed by subtracting the previous value from each value in the series. This is called first order differencing. The process can be repeated (e.g. difference the differenced series) to remove second order trends, and so on.<\/p>\n<p>A seasonal structure can be removed in a similar way by subtracting the observation from the prior season, e.g. 12 time steps ago for monthly data with a yearly seasonal structure.<\/p>\n<p>A single differenced value in a series can be calculated with a custom function named <em>difference()<\/em> listed below. The function takes the time series and the interval for the difference calculation, e.g. 1 for a trend difference or 12 for a seasonal difference.<\/p>\n<pre class=\"crayon-plain-tag\"># difference dataset\r\ndef difference(data, interval):\r\n\treturn [data[i] - data[i - interval] for i in range(interval, len(data))]<\/pre>\n<p>Again, this operation can be inverted with a custom function that adds the original value back to the differenced value named <em>invert_difference()<\/em> that takes the original series and the interval.<\/p>\n<pre class=\"crayon-plain-tag\"># invert difference\r\ndef invert_difference(orig_data, diff_data, interval):\r\n\treturn [diff_data[i-interval] + orig_data[i-interval] for i in range(interval, len(orig_data))]<\/pre>\n<p>We can demonstrate this function below.<\/p>\n<pre class=\"crayon-plain-tag\"># example of a difference transform\r\n\r\n# difference dataset\r\ndef difference(data, interval):\r\n\treturn [data[i] - data[i - interval] for i in range(interval, len(data))]\r\n\r\n# invert difference\r\ndef invert_difference(orig_data, diff_data, interval):\r\n\treturn [diff_data[i-interval] + orig_data[i-interval] for i in range(interval, len(orig_data))]\r\n\r\n# define dataset\r\ndata = [x for x in range(1, 10)]\r\nprint(data)\r\n# difference transform\r\ntransformed = difference(data, 1)\r\nprint(transformed)\r\n# invert difference\r\ninverted = invert_difference(data, transformed, 1)\r\nprint(inverted)<\/pre>\n<p>Running the example prints the original dataset, the results of the difference transform, and the original values after the transform is inverted.<\/p>\n<p>Note, the first \u201cinterval\u201d values will be lost from the sequence after the transform. This is because they do not have a value at \u201cinterval\u201d prior time steps, therefore cannot be differenced.<\/p>\n<pre class=\"crayon-plain-tag\">[1, 2, 3, 4, 5, 6, 7, 8, 9]\r\n[1, 1, 1, 1, 1, 1, 1, 1]\r\n[2, 3, 4, 5, 6, 7, 8, 9]<\/pre>\n<\/p>\n<h3>Standardization<\/h3>\n<p>Standardization is a transform for data with a Gaussian distribution.<\/p>\n<p>It subtracts the mean and divides the result by the standard deviation of the data sample. This has the effect of transforming the data to have mean of zero, or centered, with a standard deviation of 1. This resulting distribution is called a standard Gaussian distribution, or a standard normal, hence the name of the transform.<\/p>\n<p>We can perform standardization using the <a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.StandardScaler.html\">StandardScaler<\/a> object in Python from the scikit-learn library.<\/p>\n<p>This class allows the transform to be fit on a training dataset by calling <em>fit()<\/em>, applied to one or more datasets (e.g. train and test) by calling <em>transform()<\/em> and also provides a function to reverse the transform by calling <em>inverse_transform()<\/em>.<\/p>\n<p>A complete example is applied below.<\/p>\n<pre class=\"crayon-plain-tag\"># example of standardization\r\nfrom sklearn.preprocessing import StandardScaler\r\nfrom numpy import array\r\n# define dataset\r\ndata = [x for x in range(1, 10)]\r\ndata = array(data).reshape(len(data), 1)\r\nprint(data)\r\n# fit transform\r\ntransformer = StandardScaler()\r\ntransformer.fit(data)\r\n# difference transform\r\ntransformed = transformer.transform(data)\r\nprint(transformed)\r\n# invert difference\r\ninverted = transformer.inverse_transform(transformed)\r\nprint(inverted)<\/pre>\n<p>Running the example prints the original dataset, the results of the standardize transform, and the original values after the transform is inverted.<\/p>\n<p>Note the expectation that data is provided as a column with multiple rows.<\/p>\n<pre class=\"crayon-plain-tag\">[[1]\r\n [2]\r\n [3]\r\n [4]\r\n [5]\r\n [6]\r\n [7]\r\n [8]\r\n [9]]\r\n\r\n[[-1.54919334]\r\n [-1.161895  \r\n [-0.77459667]\r\n [-0.38729833]\r\n [ 0.        \r\n [ 0.38729833]\r\n [ 0.77459667]\r\n [ 1.161895  \r\n [ 1.54919334]]\r\n\r\n[[1.]\r\n [2.]\r\n [3.]\r\n [4.]\r\n [5.]\r\n [6.]\r\n [7.]\r\n [8.]\r\n [9.]]<\/pre>\n<\/p>\n<h3>Normalization<\/h3>\n<p>Normalization is a rescaling of data from the original range to a new range between 0 and 1.<\/p>\n<p>As with standardization, this can be implemented using a transform object from the scikit-learn library, specifically the <a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.MinMaxScaler.html\">MinMaxScaler<\/a> class. In addition to normalization, this class can be used to rescale data to any range you wish by specifying the preferred range in the constructor of the object.<\/p>\n<p>It can be used in the same way to fit, transform, and inverse the transform.<\/p>\n<p>A complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># example of normalization\r\nfrom sklearn.preprocessing import MinMaxScaler\r\nfrom numpy import array\r\n# define dataset\r\ndata = [x for x in range(1, 10)]\r\ndata = array(data).reshape(len(data), 1)\r\nprint(data)\r\n# fit transform\r\ntransformer = MinMaxScaler()\r\ntransformer.fit(data)\r\n# difference transform\r\ntransformed = transformer.transform(data)\r\nprint(transformed)\r\n# invert difference\r\ninverted = transformer.inverse_transform(transformed)\r\nprint(inverted)<\/pre>\n<p>Running the example prints the original dataset, the results of the normalize transform, and the original values after the transform is inverted.<\/p>\n<pre class=\"crayon-plain-tag\">[[1]\r\n [2]\r\n [3]\r\n [4]\r\n [5]\r\n [6]\r\n [7]\r\n [8]\r\n [9]]\r\n\r\n[[0.   \r\n [0.125]\r\n [0.25 ]\r\n [0.375]\r\n [0.5  \r\n [0.625]\r\n [0.75 ]\r\n [0.875]\r\n [1.   ]\r\n\r\n[[1.]\r\n [2.]\r\n [3.]\r\n [4.]\r\n [5.]\r\n [6.]\r\n [7.]\r\n [8.]\r\n [9.]]<\/pre>\n<\/p>\n<h2>Considerations for Model Evaluation<\/h2>\n<p>We have mentioned the importance of being able to invert a transform on the predictions of a model in order to calculate a model performance statistic that is directly comparable to other methods.<\/p>\n<p>Additionally, another concern is the problem of data leakage.<\/p>\n<p>Three of the above data transforms estimate coefficients from a provided dataset that are then used to transform the data. Specifically:<\/p>\n<ul>\n<li><strong>Power Transform<\/strong>: lambda parameter.<\/li>\n<li><strong>Standardization<\/strong>: mean and standard deviation statistics.<\/li>\n<li><strong>Normalization<\/strong>: min and max values.<\/li>\n<\/ul>\n<p>These coefficients must be estimated on the training dataset only.<\/p>\n<p>Once estimated, the transform can be applied using the coefficients to the training and the test dataset before evaluating your model.<\/p>\n<p>If the coefficients are estimated using the entire dataset prior to splitting into train and test sets, then there is a small leakage of information from the test set to the training dataset. This can result in estimates of model skill that are optimistically biased.<\/p>\n<p>As such, you may want to enhance the estimates of the coefficients with domain knowledge, such as expected min\/max values for all time in the future.<\/p>\n<p>Generally, differencing does not suffer the same problems. In most cases, such as one-step forecasting, the lag observations are available to perform the difference calculation. If not, the lag predictions can be used wherever needed as a proxy for the true observations in difference calculations.<\/p>\n<h2>Order of Data Transforms<\/h2>\n<p>You may want to experiment with applying multiple data transforms to a time series prior to modeling.<\/p>\n<p>This is quite common, e.g. to apply a power transform to remove an increasing variance, to apply seasonal differencing to remove seasonality, and to apply one-step differencing to remove a trend.<\/p>\n<p>The order that the transform operations are applied is important.<\/p>\n<p>Intuitively, we can think through how the transforms may interact.<\/p>\n<ul>\n<li>Power transforms should probably be performed prior to differencing.<\/li>\n<li>Seasonal differencing should be performed prior to one-step differencing.<\/li>\n<li>Standardization is linear and should be performed on the sample after any nonlinear transforms and differencing.<\/li>\n<li>Normalization is a linear operation but it should be the final transform performed to maintain the preferred scale.<\/li>\n<\/ul>\n<p>As such, a suggested ordering for data transforms is as follows:<\/p>\n<ol>\n<li>Power Transform.<\/li>\n<li>Seasonal Difference.<\/li>\n<li>Trend Difference.<\/li>\n<li>Standardization.<\/li>\n<li>Normalization.<\/li>\n<\/ol>\n<p>Obviously, you would only use the transforms required for your specific dataset.<\/p>\n<p>Importantly, when the transform operations are inverted, the order of the inverse transform operations must be reversed. Specifically, the inverse operations must be performed in the following order:<\/p>\n<ol>\n<li>Normalization.<\/li>\n<li>Standardization.<\/li>\n<li>Trend Difference.<\/li>\n<li>Seasonal Difference.<\/li>\n<li>Power Transform.<\/li>\n<\/ol>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Posts<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/power-transform-time-series-forecast-data-python\/\">How to Use Power Transforms for Time Series Forecast Data with Python<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/remove-trends-seasonality-difference-transform-python\/\">How to Remove Trends and Seasonality with a Difference Transform in Python<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/difference-time-series-dataset-python\/\">How to Difference a Time Series Dataset with Python<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/normalize-standardize-time-series-data-python\/\">How to Normalize and Standardize Time Series Data in Python<\/a><\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.stats.boxcox.html\">scipy.stats.boxcox API<\/a><\/li>\n<li><a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.MinMaxScaler.html\">sklearn.preprocessing.MinMaxScaler API<\/a><\/li>\n<li><a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.StandardScaler.html\">sklearn.preprocessing.StandardScaler API<\/a><\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Power_transform\">Power transform on Wikipedia<\/a><\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this post, you discovered how to perform and invert four common data transforms for time series data in machine learning.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>How to transform and inverse the transform for four methods in Python.<\/li>\n<li>Important considerations when using transforms on training and test datasets.<\/li>\n<li>The suggested order for transforms when multiple operations are required on a dataset.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/machine-learning-data-transforms-for-time-series-forecasting\/\">4 Common Machine Learning Data Transforms for Time Series Forecasting<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/machine-learning-data-transforms-for-time-series-forecasting\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Time series data often requires some preparation prior to being modeled with machine learning algorithms. For example, differencing operations can be used [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2018\/08\/21\/4-common-machine-learning-data-transforms-for-time-series-forecasting\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":962,"comment_status":"registered_only","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/961"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=961"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/961\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/962"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=961"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=961"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=961"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}