Monday, April 20, 2026
HomeEducationTime Series Analysis: Cointegration and Non-Stationarity

Time Series Analysis: Cointegration and Non-Stationarity

Many time series, prices, revenue, interest rates, exchange rates, web traffic, or demand, do not hover around a fixed average. They trend, drift, and often respond to shocks that do not quickly fade. This behaviour is called non-stationarity. If you apply ordinary regression or standard forecasting assumptions to such data, the output can look “statistically strong” while being misleading. Cointegration addresses a common pattern in real systems: two or more non-stationary series can still share a stable long-term equilibrium relationship. This concept is a frequent topic in data science classes in Bangalore because it prevents common mistakes when modelling trending variables.

1) Non-stationarity and the risk of spurious regression

A stationary series has statistical properties (mean and variance) that are stable over time. A non-stationary series often contains a unit root, meaning shocks can have persistent effects. For instance, if a price level jumps due to a supply disruption, it may not revert to an earlier baseline in the way a stationary process would.

Why does this matter? When two unrelated series both trend upward, regressing one on the other can produce a high R² and significant coefficients simply because both are trending. This is spurious regression: the model captures shared drifting behaviour, not a meaningful connection.

Practical ways to diagnose non-stationarity include:

  • Visual checks for trends, level shifts, and changing volatility
  • Autocorrelation patterns that decay very slowly
  • Unit-root tests such as the Augmented Dickey–Fuller (ADF) test
  • Transformations (log) and differencing to stabilise the series

If first differencing makes the series stationary, it is integrated of order one, written I(1). Cointegration is mainly concerned with I(1) variables.

2) Cointegration: long-run equilibrium inside short-run wandering

Two (or more) I(1) series are cointegrated if a particular linear combination of them is stationary. In plain terms, the variables may drift individually, but they do not drift apart indefinitely once you account for their equilibrium relationship. A useful analogy is “two boats tied together”: waves move them around day to day, but the rope limits how far apart they can get in the long run.

Cointegration is plausible in settings such as wages and consumer prices, spot and futures prices for a commodity, or a product’s selling price and a key input cost. The key point is not that the series move together every period, but that deviations from equilibrium tend to be temporary and self-correcting. Recognising this structure helps you avoid throwing away long-run information by differencing everything, an idea often reinforced in data science classes in Bangalore.

3) Testing for cointegration

Two common approaches are:

Engle–Granger two-step method (best for two variables)

  1. Estimate a long-run equation, for example, y_t = α + βx_t + ε_t.
  2. Test whether the residuals ε_t are stationary (often via an ADF-style test on residuals).
  3. Stationary residuals imply cointegration.

Johansen method (preferred for multiple variables)

Johansen’s framework tests cointegration in a multivariate system and can estimate how many distinct long-run relationships exist (the cointegration “rank”). This is useful when three or more variables may be linked by equilibrium constraints.

In both cases, results depend on whether you include an intercept or trend and on having enough history. If the system experiences regime changes, cointegration may weaken or shift, so combine statistical tests with domain context.

4) Modelling cointegrated series: ECM and VECM

Once cointegration is established, the standard modelling approach is an Error Correction Model (ECM) for two variables or a Vector Error Correction Model (VECM) for multiple variables. The defining feature is an error-correction term: the lagged equilibrium deviation. If the relationship was “out of balance” last period, the model predicts adjustments that push the system back toward equilibrium. The coefficient on this term represents the speed of adjustment.

A practical workflow:

  1. Confirm integration order (I(0) vs I(1)) using plots and unit-root tests.
  2. Test cointegration (Engle–Granger for two series; Johansen for three+).
  3. Fit ECM/VECM and validate with rolling backtests rather than a single split.
  4. Check diagnostics (autocorrelation, stability) and watch for structural breaks.

This end-to-end pipeline is a common project pattern in data science classes in Bangalore because it produces models that are statistically defensible and explainable to stakeholders.

Conclusion

Non-stationarity is common in real-time series, and ignoring it can create misleading, spurious relationships. Cointegration provides a principled way to model non-stationary variables that share a long-run equilibrium, preserving both short-run dynamics and long-run structure. By testing for unit roots, validating cointegration, and using ECM/VECM to model equilibrium adjustment, you can build time series models that are more reliable and useful for forecasting and decision-making, skills you can apply directly in forecasting and decision-making after attending data science classes in Bangalore.

RELATED POST

Latest Post

FOLLOW US