Taxonomy of Time Series Forecasting Problems

Author: Jason Brownlee

When you are presented with a new time series forecasting problem, there are many things to consider.

The choice that you make directly impacts each step of the project from the design of a test harness to evaluate forecast models to the fundamental difficulty of the forecast problem that you are working on.

It is possible to very quickly narrow down the options by working through a series of questions about your time series forecasting problem. By considering a few themes and questions within each theme, you narrow down the type of problem, test harness, and even choice of algorithms for your project.

In this post, you will discover a framework that you can use to quickly understand and frame your time series forecasting problem.

Let’s get started.

Taxonomy of Time Series Forecasting Problems
Photo by Adam Meek, some rights reserved.

Framework Overview

Time series forecasting involves developing and using a predictive model on data where there is an ordered relationship between observations.

You can learn more about what time series forecasting is in this post:

What Is Time Series Forecasting?

Before you get started on your project, you can answer a few questions and greatly improve your understanding of the structure of your forecast problem, the structure of the model requires, and how to evaluate it.

The framework presented in this post is divided into seven parts; they are:

Inputs vs. Outputs
Endogenous vs. Exogenous
Unstructured vs. Structured
Regression vs. Classification
Univariate vs. Multivariate
Single-step vs. Multi-step
Static vs. Dynamic
Contiguous vs. Discontiguous

I recommend working through this framework before starting any time series forecasting project.

Your answers may not be crisp on the first time through and the questions may require to you study the data, the domain, and talk to experts and stakeholders.

Update your answers as you learn more as it will help to keep you on track, avoid distractions, and develop the actual model that you need for your project.

1. Inputs vs. Outputs

Generally, a prediction problem involves using past observations to predict or forecast one or more possible future observations.

The goal is to guess about what might happen in the future.

When you are required to make a forecast, it is critical to think about the data that you will have available to make the forecast and what you will be guessing about the future.

We can summarize this as what are the inputs and outputs of the model when making a single forecast.

Inputs: Historical data provided to the model in order to make a single forecast.
Outputs: Prediction or forecast for a future time step beyond the data provided as input.

The input data is not the data used to train the model. We are not at that point yet. It is the data used to make one forecast, for example the last seven days of sales data to forecast the next one day of sales data.

Defining the inputs and outputs of the model forces you to think about what exactly is or may be required to make a forecast.

You may not be able to be specific when it comes to input data. For example, you may not know whether one or multiple prior time steps are required to make a forecast. But you will be able to identify the variables that could be used to make a forecast.

What are the inputs and outputs for a forecast?

2. Endogenous vs. Exogenous

The input data can be further subdivided in order to better understand its relationship to the output variable.

An input variable is endogenous if it is affected by other variables in the system and the output variable depends on it.

In a time series, the observations for an input variable depend upon one another. For example, the observation at time t is dependent upon the observation at t-1; t-1 may depend on t-2, and so on.

An input variable is an exogenous variable if it is independent of other variables in the system and the output variable depends upon it.

Put simply, endogenous variables are influenced by other variables in the system (including themselves) whereas as exogenous variables are not and are considered as outside the system.

Endogenous: Input variables that are influenced by other variables in the system and on which the output variable depends.
Exogenous: Input variables that are not influenced by other variables in the system and on which the output variable depends.

Typically, a time series forecasting problem has endogenous variables (e.g. the output is a function of some number of prior time steps) and may or may not have exogenous variables.

Often, exogenous variables are ignored given the strong focus on the time series. Explicitly thinking about both variable types may help to identify easily overlooked exogenous data or even engineered features that may improve the model.

What are the endogenous and exogenous variables?

3. Regression vs. Classification

Regression predictive modeling problems are those where a quantity is predicted.

A quantity is a numerical value; for example a price, a count, a volume, and so on. A time series forecasting problem in which you want to predict one or more future numerical values is a regression type predictive modeling problem.

Classification predictive modeling problems are those where a category is predicted.

A category is a label from a small well-defined set of labels; for example {“hot”, “cold”}, {“up”, “down”}, and {“buy”, “sell”} are categories.

A time series forecasting problem in which you want to classify input time series data is a classification type predictive modeling problem.

Regression: Forecast a numerical quantity.
Classification: Classify as one of two or more labels.

Are you working on a regression or classification predictive modeling problem?

There is some flexibility between these types.

For example, a regression problem can be reframed as classification and a classification problem can be reframed as regression. Some problems, like predicting an ordinal value, can be framed as either classification and regression. It is possible that a reframing of your time series forecasting problem may simplify it.

What are some alternate ways to frame your time series forecasting problem?

4. Unstructured vs. Structured

It is useful to plot each variable in a time series and inspect the plot looking for possible patterns.

A time series for a single variable may not have any obvious pattern.

We can think of a series with no pattern as unstructured, as in there is no discernible time-dependent structure.

Alternately, a time series may have obvious patterns, such as a trend or seasonal cycles as structured.

We can often simplify the modeling process by identifying and removing the obvious structures from the data, such as an increasing trend or repeating cycle. Some classical methods even allow you to specify parameters to handle these systematic structures directly.

Unstructured: No obvious systematic time-dependent pattern in a time series variable.
Structured: Systematic time-dependent patterns in a time series variable (e.g. trend and/or seasonality).

Are the time series variables unstructured or structured?

5. Univariate vs. Multivariate

A single variable measured over time is referred to as a univariate time series. Univariate means one variate or one variable.

Multiple variables measured over time is referred to as a multivariate time series: multiple variates or multiple variables.

Univariate: One variable measured over time.
Multivariate: Multiple variables measured over time.

Are you working on a univariate or multivariate time series problem?

Considering this question with regard to inputs and outputs may add a further distinction. The number of variables may differ between the inputs and outputs, e.g. the data may not be symmetrical.

For example, you may have multiple variables as input to the model and only be interested in predicting one of the variables as output. In this case, there is an assumption in the model that the multiple input variables aid and are required in predicting the single output variable.

Univariate and Multivariate Inputs: One or multiple input variables measured over time.
Univariate and Multivariate Outputs: One or multiple output variables to be predicted.

6. Single-step vs. Multi-step

A forecast problem that requires a prediction of the next time step is called a one-step forecast model.

Whereas a forecast problem that requires a prediction of more than one time step is called a multi-step forecast model.

The more time steps to be projected into the future, the more challenging the problem given the compounding nature of the uncertainty on each forecasted time step.

One-Step: Forecast the next time step.
Multi-Step: Forecast more than one future time steps.

Do you require a single-step or a multi-step forecast?

7. Static vs. Dynamic

It is possible to develop a model once and use it repeatedly to make predictions.

Given that the model is not updated or changed between forecasts, we can think of this model as being static.

Conversely, we may receive new observations prior to making a subsequent forecast that could be used to create a new model or update the existing model. We can think of developing a new or updated model prior to each forecasts as a dynamic problem.

For example, it if the problem requires a forecast at the beginning of the week for the week ahead, we may receive the true observation at the end of the week that we can use to update the model prior to making next weeks forecast. This would be a dynamic model. If we do not get a true observation at the end of the week or we do and choose to not re-fit the model, this would be a static model.

We may prefer a dynamic model, but the constraints of the domain or limitations of a chosen algorithm may impose constraints that make this intractable.

Static. A forecast model is fit once and used to make predictions.
Dynamic. A forecast model is fit on newly available data prior to each prediction.

Do you require a static or a dynamically updated model?

8. Contiguous vs. Discontiguous

A time series where the observations are uniform over time may be described as contiguous.

Many time series problems have contiguous observations, such as one observation each hour, day, month or year.

A time series where the observations are not uniform over time may be described as discontiguous.

The lack of uniformity of the observations may be caused by missing or corrupt values. It may also be a feature of the problem where observations are only made available sporadically or at increasingly or decreasingly spaced time intervals.

In the case of non-uniform observations, specific data formatting may be required when fitting some models to make the observations uniform over time.

Contiguous. Observations are made uniform over time.
Discontiguous. Observations are not uniform over time.

Are your observations contiguous or discontiguous?

Summary

To review, the themes and questions you can ask about your problem are as follows:

Inputs vs. Outputs
1. What are the inputs and outputs for a forecast?
Endogenous vs. Exogenous
1. What are the endogenous and exogenous variables?
Unstructured vs. Structured
1. Are the time series variables unstructured or structured?
Regression vs. Classification
1. Are you working on a regression or classification predictive modeling problem?
2. What are some alternate ways to frame your time series forecasting problem?
Univariate vs. Multivariate
1. Are you working on a univariate or multivariate time series problem?
Single-step vs. Multi-step
1. Do you require a single-step or a multi-step forecast?
Static vs. Dynamic
1. Do you require a static or a dynamically updated model?
Contiguous vs. Discontiguous
1. Are your observations contiguous or discontiguous?

Did you find this framework useful for your time series forecasting problem?
Let me know in the comments below.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post Taxonomy of Time Series Forecasting Problems appeared first on Machine Learning Mastery.

Go to Source