White Noise, Autocorrelation and Seasonal Decomposition

Time Series Definition

In this course we are interested are mostly univariate discrete-time series which may be denoted by \(\{z_t\}\) or more explicitly \(z_1, z_2, \ldots\). Sometimes we may consider bivariate time series \((x_t, y_t), t=1,2,\ldots\) or more general \(k\)-variate time series \((x_{t,j}), t=1,2,\ldots; j=1,\ldots,k\).

The data generating mechanism is assumed to be some probability model. In the simplest case, we may assume an infinite normal distribution which in the univariate case simply means that for any \(T\) and any real \(\alpha_1,\ldots,\alpha_T\) then \(\alpha_1 z_1 + \ldots + \alpha_T z_T\) is normally distributed. In many cases the normal assumption works very well as an approximation.

Two important illustrative examples of time series models are the white noise model and random walk with deterministic drift.

White noise time series

We assume that \(z_t, t=1,2,\ldots\) is a collection of independent and identically distributed random variables. Often we may also assume that these variables are centered to have mean zero, so \({\rm E}\{z_t\} = 0, t=1,2,\ldots\) and \({\rm Var}(z_t) = \sigma_z^2\). When this distribution is normal, the term Gaussian white noise is used.

The term white noise arose in electrical engineering where it is useful to decompose a time series into a series of random sinuosids. For white noise, the expected amplitudes are equal at all frequencies just like when ordinary “white” light is decomposed by a prism into the familiar ROYGBIV spectrum.

Sometimes more general definitions of white noise are used so the only requirements are that of constant mean, constant variance and uncorrelatedness. These assumptions may be written, \({\rm E}\{z_t\} = 0\), \({\rm Var}\{z_t\} = \sigma_z^2\) and \({\rm Cov}\{z_t, z_s\} = 0, t \ne s; t,s=1,2,\ldots\)

The infinite dimensionsal collection of random variables that comprise the time series are known as an ensemble which is the counterpart of sample space in mathematical statistics.

In practice we obtain \(T\) consecutive observations of a time series, \(z_t, t=1,\ldots,T\). Our methods are most useful when \(T \ge 50\) and preferably \(T \ge 100\). This is a desirable sample size, not an absolute requirement since we must often deal with shorter time series. Four examples of Gaussian white noise with mean 100 and standard deviation 15 with \(T = 50\) are shown below.

Optimal Prediction and Prediction Intervals

Given an observed a time series \(z_t\) for \(t=1,\ldots,T\), the optimal forecast for a future value of the time series, \(z_{T+\ell},\ \ell \ge 1\) is the expected value under the model conditional on the observed data \(z_1, \ldots, z_T\). The optimal forecast is denoted by \(z_T(\ell)\). This notation indicated that \(z_T(\ell)\) is the optimal forecast for the future value at forecast origin time \(T\) and lead time \(\ell\).

For the simple example of Gaussian white noise with mean 100 and standard deviation 15, \(z_T(\ell) = 100\) for all \(\ell \ge 1\) and its variance is denoted \(V_\ell\) and \(V_\ell = 15^2, \ell = 1, 2, \ldots\).

Under the normality assumption the \(1-\alpha\) prediction interval for \(z_T(\ell)\) is \(z_T(\ell) \pm \Phi^{-1}(1-\alpha/2) \sqrt{V_{\ell}}\), where \(\Phi^{-1}(z)\) is the inverse normal CDF. In time series prediction we may consider intervals corresponding to 50%, 75% and 95% corresponding to \(\alpha=0.5, 0.25, 0.05\) and \(\Phi^{-1}(1-\alpha/2)\) corresponding then to 0.67, 1.15, 1.96.

Under more general distributional assumptions than normality, the prediction intervals may be estimated using a parametric bootstrap.

Stationary Time Series, ACVF and ACF

The time series \(z_t, t=1,2,\ldots\) is stationary if and only if its mean and variance exists and \(\gamma_k = {\rm Cov}(z_t, z_{t-k})\) exists and does not depend on \(t\). The function \(\gamma_k, k = 0, \pm 1, \pm 2, ...\) is called the theoretical autocovariance function (TACVF). In many applications we will use the autocorrelation function. The theoretical autocorrelation function (TACF) is defined by

\[\rho_k = \frac{\gamma_k}{\gamma_0}.\]

For white noise \(\rho_k = 0,\ k \ne 0\) and \(\rho_0 = 1\).

Given \(T\) consecutive observations \(z_t, t=1,\ldots,T\) the sample autocovariance function (SACVF) at lag \(k=0,1,\ldots\) is defined by

\[c_k = \frac{1}{T} \sum_{t=k+1} (z_t-\bar{z})(z_(t-k) - \bar{z}),\]

where \(\bar{z}\) is the sample mean of \(z_1, \ldots, z_T\) and \(c_{k}=c_{-k}\) since it is symmetric about the origin.

Illustration with SHOPIFY stock market prices

We retrieve the data from SHOP and display its time series plot. There are T = 5376 consecutive observations shown the time series plots which correspond to successive trading days in the period 2015-05-20 to 2018-12-07. The series clearly exhibits non-stationarity due to trends and also due to non-constant variance over time. In addition to random volatility changes that is characteristic of many financial time series.

Because of the strong trend, the SACF is close to 1 and decays or diminishes very gradually. Mathematical analysis has demonstrated that a linear slow decay is characteristic of many non-stationary time series. The ACF is often used in practice for deciding if the time series is stationary or non-stationary.

The autocorrelation plot of the differenced series \(w_t = z_{t} - z_{t-1}\) is shown below. Because most of the sample autocorrelations are less than the 95% benchmark values, \(\pm 1.96 \times T^{-\frac{1}{2}} = \pm 0.03\).

But white noise, does not imply statistical independence (unless it is also assumed to be Gaussian). In fact the squared-differences are definitely autocorrelated as shown in the plot below. Stock market returns often show similar behaviour.

Homework Exercises

Find the mean, variance and TACF of the following stationary time series:

\(z_t = 100 + e_t + 0.8 e_{t-1}\) where \(e_t\) is Gaussian white noise with mean 2 and variance 9.
\(z_t = 100 + (e_t a_t) + 0.8 e_{t-1}\), where \(e_t\) and \(a_t\) are both Gaussian white noise with mean zero and variance one.
\(z_t = 100 + a_t e_t + 0.8 a_{t-1} e_{t-1}\), where \(e_t\) and \(a_t\) are both Gaussian white noise with mean zero and variance one.

Show that for any stationary time series its theoretical autocovariance function is symmetric about the origin, that is, \(\gamma_k = \gamma_{-k}\). Also show that \(-\gamma_0 \le \gamma_1 \le \gamma_0\). Hint: Consider \(y = z_t\) and \(x = z_{t-1}\) then use the result that for any random variables \(x, y\) with finite variance the correlation coefficient must be less than or equal to one.

Seasonal Decompositions

(Official Statistics)[https://en.wikipedia.org/wiki/Official_statistics] is the name given to data that is published by government agencies such as Statistics Canada. Many of these “official statistics” such as unemployment, consumer price indicies and so forth are published on a monthly basis and reflect seasonal variations so seasonal adjustment methods have been used to produce seasonally adjusted estimates.

Seasonal adjustment methods use one of the following decompositions:

Additive Decomposition

\[z_t = T_t +S_t +I_t\],

and \({\rm SA}_t = z_t - S_t\),

Multiplicative Decomposition

\[z_t = T_t \times S_t \times I_t\],

and \({\rm SA}_t = z_t/S_t\),

where \(z_t\) is the observed time series at time \(t\), \(T_t\) is the long-term trend, \(S_t\) is the seasonal including trading-day effects \(R_t\) is the remainder or irregular component and \({\rm SA}_t\) is the seasonally adjusted time series.

The the remainder is stationary but not white noise. In fact the seasonal decomposition is not a probability model at all. It is fundamentally just an algorithm which decomposes the time series into components that may intutively be consider as representing long-term trend, seasonal and trading-day effects and the remainder is what is left over.

The three interesting seasonal decomposition algorithms are available in R. The simplest is stats::decompose(), stats::stl() and seasonal::seas(). The first two are built-in R functions while seasonal::seas() requires installation of the CRAN package seasonal and it is the most elaborate algorithm implementing a method use by the US Census Bureau and is very similar to the algorithm used by Statistics Canada.

Forecasting using seasonal decompositions

We may fit forecasting models to the individual components \(T_t\), \(S_t\) and \(R_t\) and forecast each component separately and then combine them.

Crude but simple decomposition with decompose()

The decomposition is illustrated with the classic example of a complex monthly time series that is comprised of the number (in units of 1000) of airline passengers travelling from England to New York, January 1949 to December 1960. This one of the built-in datasets in R, AirPassengers.

It was shown many years ago that this complex time series could be forecast quite accurately using a type of EWMA (expoentially weighted moving average).

We show below the multiplicative decomposition of this time series. The seasonal component is assumed fixed whereas in more advanced algorithm it is allowed to vary over time.

State-of-the art Seasonal Decomposition

A recent algorithm from the US Census Bureau is X-13ARIMA-SEATS and is implemented in the CRAN package seasonal. This algorithm is very similar to that used by Statistics Canada and other official government statistical agencies. It’s sophisticated application is a bit complex but it has wisely chosen defaults since in many applications it is desirable to seasonal decompose many thousands of time series.

We show for comparison the seasonal decomposition of the AirPassengers time series.

The R function stats::monthplot() is useful for visualization of the seasonal component.

Remark: X-ARIMA. Statistics Canada Invention

A major problem with seasonal adjustment algorithms is that they are have often used, or rather mis-used, for policy and political purposes. This is problematic due to the problem of revisions. After publication new data on a variable of interest, say the unemployment rate becomes available and the new data gives a revised estimate of the previously published result. Small revisions may be acceptable but large revisions are not. Researchers at Statistics Canada found that by using seasonal ARIMA time series models to extrapolate both ends of the time series, the impact of revisions could be greatly reduced.

Seasonal-Trend-Loess Decomposition (STL)

The R function stl() implements the seasonal-trend-loess (STL) method of seasonal adjustment (Cleveland et al., 1993).
This method is available only for monthly time series and only for series with samples in every month for each year in the span.
The traditional seasonal decomposition model is used \(z_t = T_t + S_t + R_t\). Multiplicative decompositions are fit by taking logarithms of the data.

First the trend term, \(T_t\) is estimated using a loess smooth of \(z_t\) regressed on \(t\). Next the seasonal term, \(S_t\), is estimated by using a separate loess smooth for each seasonal subseries \(z_{r,s}\), where \(r=1,\ldots,n\) and \(n\) is the number of years and \(s=1,\ldots,12\). Lastly the remainder term is obtained \(R_t = z_t-T_t-S_t\). The whole procedure is iterated using a backfitting algorithm. Incidentally the backfitting algorithm that is widely used today in Generalized Addive Models was invented in the 1950’s in the development of the X-11 seasonal decomposition.

The STL decomposition is illustrated with the famous Mauna Loa CO2 time series (from Al Gore’s movie, An Inconvenient Truth). This dataset is available in R as co2 and its STL decomposition is shown below. A new scientific discovery revealed by this plot is that the seasonal amplitudes seem to be increasing.

The change is seasonal amplitudes is evident from the monthly subseries plot of the seasonal component.

The irregular component is not white noise as demonstrated in the ACF plot below.