Contiguous or noncontiguous time series values of your forecasting model – A time series that present a consistent temporal interval (for example, every five minutes, every two hours, or every quarter) between each other are defined as contiguous (Zuo et al. 2019). On the other hand, time series that are not uniform over time may be defined as noncontiguous: very often the reason behind noncontiguous timeseries may be missing or corrupt values. Before jumping to the methods of data imputation, it is important to understand the reason data goes missing. There are three most common reasons for this:Missing at random: Missing at random means that the propensity for a data point to be missing is not related to the missing data but it is related to some of the observed data.Missing completely at random: The fact that a certain value is missing has nothing to do with its hypothetical value and with the values of other variables.Missing not at random: Two possible reasons are that the missing value depends on the hypothetical value or the missing value is dependent on some other variable's value.In the first two cases, it is safe to remove the data with missing values depending upon their occurrences, while in the third case removing observations with missing values can produce a bias in the model. There are different solutions for data imputation depending on the kind of problem you are trying to solve, and it is difficult to provide a general solution. Moreover, since it has temporal property, only some of the statistical methodologies are appropriate for time series data.I have identified some of the most commonly used methods and listed them as a structural guide in Figure 1.7.Figure 1.7: Handling missing dataAs you can observe from the graph in Figure 1.7, listwise deletion removes all data for an observation that has one or more missing values. Particularly if the missing data is limited to a small number of observations, you may just opt to eliminate those cases from the analysis. However, in most cases it is disadvantageous to use listwise deletion. This is because the assumptions of the missing completely at random method are typically rare to support. As a result, listwise deletion methods produce biased parameters and estimates.Pairwise deletion analyses all cases in which the variables of interest are present and thus maximizes all data available by an analysis basis. A strength to this technique is that it increases power in your analysis, but it has many disadvantages. It assumes that the missing data is missing completely at random. If you delete pairwise, then you'll end up with different numbers of observations contributing to different parts of your model, which can make interpretation difficult.Deleting columns is another option, but it is always better to keep data than to discard it. Sometimes you can drop variables if the data is missing for more than 60 percent of the observations but only if that variable is insignificant. Having said that, imputation is always a preferred choice over dropping variables.
Regarding time series specific methods, there are a few options:Linear interpolation: This method works well for a time series with some trend but is not suitable for seasonal data.Seasonal adjustment and linear interpolation: This method works well for data with both trend and seasonality.Mean, median, and mode: Computing the overall mean, median, or mode is a very basic imputation method; it is the only tested function that takes no advantage of the time series characteristics or relationship between the variables. It is very fast but has clear disadvantages. One disadvantage is that mean imputation reduces variance in the data set.
In the next section of this chapter, we will discuss how to shape time series as a supervised learning problem and, as a consequence, get access to a large portfolio of linear and nonlinear machine learning algorithms.
Supervised Learning for Time Series Forecasting
Machine learning is a subset of artificial intelligence that uses techniques (such as deep learning) that enable machines to use experience to improve at tasks (aka.ms/deeplearningVSmachinelearning
). The learning process is based on the following steps:
1 Feed data into an algorithm. (In this step you can provide additional information to the model, for example, by performing feature extraction.)
2 Use this data to train a model.
3 Test and deploy the model.
4 Consume the deployed model to do an automated predictive task. In other words, call and use the deployed model to receive the predictions returned by the model (aka.ms/deeplearningVSmachinelearning).
Machine learning is a way to achieve artificial intelligence. By using machine learning and deep learning techniques, data scientists can build computer systems and applications that do tasks that are commonly associated with human intelligence. These tasks include time series forecasting, image recognition, speech recognition, and language translation (aka.ms/deeplearningVSmachinelearning
).
There are three main classes of machine learning: supervised learning, unsupervised learning, and reinforcement learning. In the following few paragraphs, we will have a closer look at each of these machine learning classes:
Supervised learning is a type of machine learning system in which both input (which is the collection of values for the variables in your data set) and desired output (the predicted values for the target variable) are provided. Data is identified and labeled a priori to provide the algorithm a learning memory for future data handling. An example of a numerical label is the sale price associated with a used car (aka.ms/MLAlgorithmCS). The goal of supervised learning is to study many labeled examples like these and then to be able to make predictions about future data points, like, for example, assigning accurate sale prices to other used cars that have similar characteristics to the one used during the labeling process. It is called supervised learning because data scientists supervise the process of an algorithm learning from the training data set (www.aka.ms/MLAlgorithmCS): they know the correct answers and they iteratively share them with the algorithm during the learning process. There are several specific types of supervised learning. Two of the most common are classification and regression:Classification: Classification