The formula sheet reduces some memorization burden, but it does not remove the need to know definitions, interpretation language, and model-checking logic. In this course, many questions are not about recalling a formula mechanically. They are about recognizing what structure is present, what assumptions are being used, and what a plot or covariance pattern is telling us.

1. Weak Stationarity

A time series $\{X_t\}$ is weakly stationary if two conditions hold:

  1. The mean is constant over time: $E[X_t] = \mu$ for all $t$.
  2. The covariance depends only on the lag, not on absolute time: $\mathrm{Cov}(X_{t+h}, X_t) = \gamma(h)$.

This is the operational definition behind almost every stationarity question. In exam language, a correct justification usually has to say both that the mean does not depend on $t$ and that the covariance depends only on $h$.

A useful mental compression is:

  • stationary mean
  • lag-based covariance

If either fails, the process is not weakly stationary.

2. ACVF, ACF, and PACF

The autocovariance function is

$$\gamma(h) = \mathrm{Cov}(X_{t+h}, X_t).$$

It measures the linear dependence between observations $h$ time units apart, in the original covariance scale.

The autocorrelation function is

$$\rho(h) = \frac{\gamma(h)}{\gamma(0)}.$$

It is the standardized version of the autocovariance function, so it is easier to compare across lags and across models.

The partial autocorrelation function at lag $h$ measures the direct linear relationship between $X_t$ and $X_{t+h}$ after removing the linear effect of the intermediate observations $X_{t+1}, \dots, X_{t+h-1}$.

These three objects answer different questions:

  • ACVF: how large is the lag-$h$ covariance
  • ACF: how strong is the lag-$h$ correlation after standardization
  • PACF: does lag $h$ still matter directly once lower lags have been accounted for

That last point is the key reason PACF is useful for identifying autoregressive order.

3. Causality and Invertibility

For an ARMA model written as

$$\phi(B) X_t = \theta(B) Z_t,$$

causality and invertibility are determined by the roots of the associated polynomials.

For causality, look at the AR polynomial $\phi(z)$. The process is causal if all roots of $\phi(z) = 0$ lie outside the unit circle, meaning $|z| > 1$ for every root.

For invertibility, look at the MA polynomial $\theta(z)$. The process is invertible if all roots of $\theta(z) = 0$ lie outside the unit circle, again meaning $|z| > 1$ for every root.

The interpretation is important:

  • causal means $X_t$ can be represented as a convergent linear filter of current and past white noise terms
  • invertible means the white noise sequence can be recovered from current and past observations through a convergent linear filter

The phrase to remember is simple: all relevant roots must lie outside the unit circle.

4. White Noise and the Logic of White Noise Testing

A white noise process $\{Z_t\}$ in this course is characterized by

  • $E[Z_t] = 0$
  • $\mathrm{Var}(Z_t) = \sigma^2$
  • $\mathrm{Cov}(Z_t, Z_{t-k}) = 0$ for all nonzero lags

So white noise has constant mean, constant variance, and no serial correlation beyond lag $0$.

The testing idea is not to prove literal perfection from finite data. The practical question is whether the sample ACF shows evidence against white noise. If a series is behaving like white noise, then most sample autocorrelations at nonzero lags should fluctuate around $0$ without systematic structure, and most spikes should remain within the approximate confidence bounds.

If many spikes are clearly outside the bounds, or the sample ACF shows trend-like decay, oscillation, or seasonal repetition, then the white noise hypothesis is not well supported.

5. How ACF and PACF Identify AR, MA, ARMA, and Random Walk Structure

These recognition rules must be automatic.

AR($p$)

For an AR($p$) process, the ACF typically tails off rather than cutting off sharply. It often decays exponentially or in a damped oscillatory pattern. The PACF cuts off after lag $p$.

So the signature is:

  • ACF tails off
  • PACF cuts off at lag $p$

MA($q$)

For an MA($q$) process, the ACF cuts off after lag $q$, while the PACF tails off gradually.

So the signature is:

  • ACF cuts off at lag $q$
  • PACF tails off

ARMA($p,q$)

For ARMA models, neither the ACF nor the PACF usually has a clean finite cutoff. Both tend to tail off.

So the signature is:

  • ACF tails off
  • PACF tails off

White Noise

For white noise, both ACF and PACF should be near zero at all nonzero lags, apart from random sampling fluctuation.

Random Walk

A random walk is not stationary. Its ACF does not die out in the way a stationary process should. In practical plot interpretation, a very persistent ACF is a warning sign that differencing may be needed before ARMA-style modelling makes sense.

6. Technical Language for Plot Interpretation

The exam can ask for short written interpretation, not just calculation. The right style is compact, technical, and tied to visible features of the plot.

Time Plot

Use language like:

  • The series shows an upward trend.
  • The series fluctuates around a roughly constant level.
  • There is clear seasonality with period $d$.
  • The variability appears roughly constant over time.
  • The mean level changes over time, so the series does not appear stationary.

The general rule is: point to a visible feature, then state its modelling implication.

ACF Plot

Use language like:

  • The ACF cuts off after lag $q$, suggesting an MA($q$) structure.
  • The ACF decays gradually, suggesting an AR-type structure.
  • Significant spikes at seasonal lags suggest a seasonal component.
  • The ACF does not die out, which is evidence against stationarity.

PACF Plot

Use language like:

  • The PACF cuts off after lag $p$, suggesting an AR($p$) model.
  • The PACF tails off, so a pure finite-order AR cutoff pattern is not supported.

Forecast Plot

Use language like:

  • Prediction intervals widen as the forecast horizon increases.
  • This model tracks the observed series more closely on the test set.
  • This model appears more accurate, but interval calibration should also be checked.
  • The forecasts are systematically biased high or low relative to the observed values.

The key is not to describe the plot vaguely. The goal is to convert visual evidence into model-relevant conclusions.

7. Residual Analysis and Model Checking

Residual analysis asks whether the fitted model has removed the temporal structure it was supposed to explain.

If a model is adequate, the residuals should behave approximately like white noise. That means:

  • residuals should fluctuate around zero
  • there should be no obvious remaining trend
  • there should be no obvious remaining seasonality
  • the residual ACF should show no systematic significant structure at nonzero lags

If the residuals still display dependence, then the model has failed to capture all relevant structure. In other words, the model is inadequate, not because it is mathematically illegal, but because it leaves predictable structure in the leftover series.

A useful sentence for short-answer responses is:

Residual analysis checks whether the fitted model has absorbed the serial dependence in the data. If the residuals still show temporal structure, the model is missing something important.

8. Forecasting Notation: How the Symbols Map to the Question

The formula sheet gives the best linear predictor in the form

$$P(Y \mid W) = \mu_Y + a^T (W - \mu_W),$$

where $a$ solves

$$\Gamma a = \gamma,$$

and the mean squared error is

$$E\left[\left(Y - P(Y \mid W)\right)^2\right] = \mathrm{Var}(Y) - a^T \gamma.$$

The real challenge is not memorizing these formulas. It is identifying each object correctly in context.

What is $Y$?

$Y$ is the quantity you want to predict. Depending on the question, it could be:

  • the next observation, such as $X_{n+1}$
  • a future observation, such as $X_{n+h}$
  • a missing observation, such as $X_2$

What is $W$?

$W$ is the vector of observed variables used to predict $Y$. For example:

  • $W = (X_n, X_{n-1}, \dots, X_1)^T$
  • $W = (X_1, X_3)^T$ if a missing value is being estimated from neighboring observations

What are $\mu_Y$ and $\mu_W$?

These are just the means:

  • $\mu_Y = E[Y]$
  • $\mu_W = E[W]$

If the process has mean zero, these simplify immediately.

What is $\Gamma$?

$\Gamma$ is the covariance matrix of the predictor vector $W$:

$$\Gamma = \mathrm{Cov}(W, W).$$

Its $(i,j)$ entry is $\mathrm{Cov}(W_i, W_j)$.

What is $\gamma$?

$\gamma$ is the covariance vector between the predictor vector and the target:

$$\gamma = \mathrm{Cov}(W, Y).$$

Its $i$th entry is $\mathrm{Cov}(W_i, Y)$.

What is $a$?

$a$ is the coefficient vector of the best linear predictor. Once $\Gamma$ and $\gamma$ are identified, solve $\Gamma a = \gamma$, then substitute into the predictor formula.

For stationary processes, both $\Gamma$ and $\gamma$ are usually filled using the ACVF $\gamma(h)$ according to the relevant lags. That is why forecasting problems in time series are often really covariance-structure problems in disguise.

9. What Actually Has to Be Remembered

The formula sheet helps with some algebraic templates, but it does not replace conceptual memory. What must still be known cold is:

  • the definition of weak stationarity
  • what ACVF, ACF, and PACF mean
  • the root conditions for causality and invertibility
  • the meaning of white noise and how to assess it from a sample ACF
  • the identification patterns for AR, MA, ARMA, and nonstationary behavior
  • technical language for plot interpretation
  • the role of residual analysis in model checking
  • how to map a forecasting problem into $Y$, $W$, $\mu_Y$, $\mu_W$, $\Gamma$, and $\gamma$

That is the material that tends to separate mechanical formula use from actual understanding.