Random Walk and Forecasting

Random walk with deterministic drift

The model equation is

\[ z_t = \delta + z_{t-1} + e_t, t=1,2\ldots \],

where \(\delta\) is the drift parameter, \(e_t\) is white noise with mean 0 and variance \(\sigma_e\). We also need to specify an initial value for \(z_0\).

Then the random walk can be written in random shock form

\[ z_t = z_0 + t \delta + \sum_{s=1}^t e_s, t=1,2\ldots \].

This equation shows how this time is driven by the random shocks \(e_1, e_2, \ldots\). The time series can be decomposed

\[z_t = {\rm current\ random\ shock}+{\rm constant\ term}+ {\rm sum\ of\ past\ shocks}\].

The random shocks \(e_t, t=1,2,\ldots\) are poetically called the time series innovations since they indicate the new information that arrives at time \(t\).

In the simulated random walks below we take \(T=50\), \(\delta=0.25\) and \(\sigma_e^2 = 1\).

Now consider a sequence of random variables that are generated by the random shock form of the random walk model, so

\[ z_t = z_0 + t \delta + \sum_{s=1}^t e_s, t=1,2\ldots \].

Taking expectation and variance in this equation we obtain \({\cal E}(z_t) = z_0 + \delta t\) and \({\rm Var}(z_t) = t \sigma^2_e\). So the time series is not stationary. In this case both the mean and variance depend on \(t\).

Now consider the forecasts at origin time \(T\) and lead time \(\ell\). From the random shock form of the model we may write, \[z_{T+\ell} = z_T + \ell \delta + \sum_{s=T+1}^{T+\ell} e_s, \ell=1,2\ldots \]. Hence the optimal forecast at origin time \(T\) and lead time \(\ell\) is \(z_T(\ell) = z_T + \ell \delta\) with variance \(V_\ell = \ell \sigma_e^2\).

Labor Force Model and Predictions

The bottom figure shows the Labor Force Participation Rate for Females age 20-44 living in a household with a spouse and one child under the age of six for the years 1968-1998. There are only 31 years of data. The top panel DIFF shows the differenced LFPR series which we see is approximately white noise.

Random Walk with Deterministric Drift Model

The mean and standard deviation of the differenced time series was found to be 0.0120949 and 0.0100669. Hence the parameter estimates for the random walk model with deterministic drift are \(\hat{\delta} =\) 0.012 and \(\hat{\sigma}_a =\) 0.01. The last observed value of LFRP for 1998 was 0.6407296 so the forecast function may be written, \(z_{1998}(\ell) = 0.6407 + 0.121 \times \ell\) with variance function \(V_\ell = 0.01 \times \ell\).

Comparison LS and Random Walk

Regression Model

Another approach to modeling this data would be to fit a straightline regression to the data and use this for predicting future values. The least squares estimates are shown in the table below.

LS estimates.
	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	-26.1189	0.8295	-31.4866	0.0000
YEAR	0.0134	0.0004	32.0694	0.0000

The coefficient of determination is \(R^2\) = 97.3% with residual standard deviation \(\sigma\) = 0.021 on 29 DF. So the regression estimate of the residual standard deviation is 2.1 times larger.

It is also of interest to note from the Figure Comparison LS and Random Walk, we see that there is systematic lack of fit from the straightline regression.

Exercise 7.3. Euro Exhange Rates. Page 249.

a(i)

The basic requirement for stationarity is that the mean and variance of the time series should be constant over time. An additional requirement is that the autocorrelation function (ACF) must only depend to the lag separation. The ACF must not depend on time.

a(ii)

The exeuus series in the time series trace plot below shows very strong trends over time. The trend is not monotonic but exhibits typical wandering behaviour associated with many types of financial time series. In the early part of the series, the exeuus series increases reaching a peak of 0.857 on November 14, 2005. Then it declines steadily over the remaining time period. So the mean is definitely not constant and hence the series is nonstationary.

Daily Euro-US FX Rates

b(i)

In the Figure below we fit a quadratic regression of exeuus on TIME where this variable indicates the observation number 1, 2, …, 699 and plot the time series trace with the fitted regression superimposed.

Daily Euro-US FX Rates With Quadratic Regression Fit

Normally when using R the recommended approach to fitting quadratic and polynomial regression is use the R function for orthogonal polynomials. For the purpose of regression models, orthogonal polynomials are merely a different basis and give the same result as regular everyday polynomials. But orthogonal polynomial regression may in some rare situations be more accurate but this almost never happens today since R typically use very high precision numbers for its computations.

Because orthogonal polynomials are more difficult to evaluate directly, for ease of interpretation, I have defined the variable TIMESQ as the square of TIME. The last five observations are listed below.

fxdata <- fxdata %>% mutate(TIMESQ = TIME^2)
tail(fxdata, n=5)

## # A tibble: 5 x 5
##   date       exhkus exeuus  TIME  TIMESQ
##   <date>      <dbl>  <dbl> <int>   <dbl>
## 1 2008-01-02   7.81  0.678   695 483025.
## 2 2008-01-03   7.80  0.679   696 484416.
## 3 2008-01-04   7.80  0.676   697 485809.
## 4 2008-01-07   7.80  0.681   698 487204.
## 5 2008-01-08   7.80  0.680   699 488601.

The fitted regression is displayed in the table below.

LS estimates.
	Estimate	Std. Error	t value
(Intercept)	0.8080	0.0019	433.9310
TIME	0.0001	0.0000	10.5426
TIMESQ	-0.0000	0.0000	-27.2975

The coefficient of determination is \(R^2\) = 87.3% with residual standard deviation \(\sigma\) = 0.016 on 696 DF. So the regression estimate of the residual standard deviation is 1.6 times larger.

b(ii)

There is clearly large systematic departure from the fitted curve indicating lack-of-fit. In this case the systematic departure is due to autocorrelation in the residuals as can be seen the ACF plot below.

###b(iii)

The last of observation in the time series corresponds to the observation number \(t = 699\). The forecasts at lead times \(\ell = 1,2,3\) are computed by sub-ing the values \(t=700, 702, 703\) into the regression equation,

\[\hat{{\rm exeuus}}_t = 0.8080142 + 0.00001295210 \times {\rm TIME} -4.639167\times 10^{-7} \times {\rm TIMESQ}.\]

This can more easily be done at a higher level using the R function predict().

newdata <- tibble(TIME=700:702, TIMESQ=(700:702)^2)
predict(ans, newdata=newdata)

##         1         2         3 
## 0.6713597 0.6708393 0.6703179

c(i)

We add DIFFEURO the differenced value of exeuus to the R data.frame and construct the time series trace plot. Since the loess trend is relatively flat, the series can be assumed to be approximately stationary in the mean. The variability of the series is also approximately constant over time. So we assume stationarity.

fxdata <- fxdata %>%
  mutate(DIFFEURO = c(NA, diff(exeuus)))
delta <- mean(fxdata$DIFFEURO, na.rm=TRUE)
sigma <- sd(fxdata$DIFFEURO, na.rm=TRUE)
fxdata %>%
  ggplot(aes(x=date, y=DIFFEURO)) +
   geom_line() +
   geom_smooth(se=FALSE) +
   ggtitle("Differenced `exeuus` series with loess trend")

We will assume the random walk with deterministic drift since this is preferable to the purely random walk model for financial time series. This model has two parameters to estimate: the drift and standard deviation. These are estimated by the sample mean and standard deviation of the differences. The resulting estimates are \(\delta = -0.0001373926\) and \(\sigma = 0.003621979\). The last value in the series is \({\rm exeuus}_{699} = 0.68\), so the prediction equation for this model may be written,

\[{\rm exeuus}_{699}(\ell) = 0.68 -0.0001373926 \ell\]

and the variance of the \(\ell\)-step prediction is

\[V(\ell) = (0.003621979)^2 \ell\].

The 95% Prediction interval for the lead \(\ell\) forecast is

\[ 0.68 -0.0001373926 \ell \pm 2\times 0.003621979 \times \sqrt{\ell}.\]

Remark: The mean of the differenced series, DIFFEURO is not significantly different from zero using a t-test or the robust wilcox.test.

 wilcox.test(fxdata$DIFFEURO, na.action=na.rm())

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  fxdata$DIFFEURO
## V = 110590, p-value = 0.255
## alternative hypothesis: true location is not equal to 0

So it might be better to omit the deterministic term.