Serial Correlation: Durbin-Watson Test, HAC Standard Errors & FGLS

Serial correlation in regression is one of the most common — and most overlooked — problems in time series econometrics. When regression errors are correlated across time periods, standard OLS inference breaks down: standard errors become unreliable, t-statistics are inflated, and researchers risk drawing false conclusions. Whether you’re modeling bond yield dynamics, forecasting interest rates, or analyzing stock return predictability, understanding how to detect and correct serial correlation is essential for valid statistical inference.

What Is Serial Correlation?

Serial correlation (also called autocorrelation) occurs when the error terms in a regression model are correlated across time periods. In a properly specified model with independent errors, knowing today’s error tells you nothing about tomorrow’s. With serial correlation, that independence breaks down — errors exhibit a pattern over time.

Key Concept

Serial correlation means that Corr(ut, us) ≠ 0 for t ≠ s. In plain terms, today’s regression error is correlated with yesterday’s error — the errors are not independent across time.

The most common form is first-order serial correlation, modeled as an AR(1) process:

AR(1) Error Process
ut = ρut−1 + et,   |ρ| < 1
The error at time t depends on the previous period’s error, where ρ is the serial correlation parameter and et is a white noise innovation term

Positive serial correlation (ρ > 0) is by far the most common in financial data. A positive error today tends to be followed by a positive error tomorrow — residuals cluster in runs above and below zero. This arises naturally because economic conditions like interest rate regimes, credit cycles, and business cycle phases persist across periods.

Negative serial correlation (ρ < 0) is less common but can occur in models with overcorrection dynamics, where a positive error today is followed by a negative error tomorrow.

Serial correlation typically arises from omitted slowly-changing variables, model misspecification (such as omitting a time trend or seasonal component), or inherent inertia in economic data. Interest rate series, corporate earnings growth, and credit spreads are classic examples of data that generate serially correlated regression errors.

Consequences of Serial Correlation for OLS

Critical Distinction

Under standard assumptions (strictly exogenous regressors in a static or finite distributed lag model), serial correlation does not bias OLS coefficient estimates — they remain unbiased and consistent. The problem is entirely with inference: standard errors, t-statistics, confidence intervals, and hypothesis tests become unreliable.

When positive serial correlation is present in time series regression, four problems emerge:

  1. Standard errors are biased downward — OLS assumes each observation provides independent information. With positive serial correlation, consecutive observations carry overlapping information, so the effective sample size is smaller than OLS assumes. The result: standard errors that are too small.
  2. t-statistics are inflated — Because standard errors are too small, the corresponding t-statistics are too large. Researchers may find “statistically significant” relationships that are actually noise.
  3. Confidence intervals are too narrow — The understated standard errors produce confidence intervals that fail to achieve their nominal coverage level (e.g., a “95%” interval may actually cover the true parameter only 80% of the time).
  4. Model misspecification often accompanies serial correlation — While R-squared itself remains a consistent measure under stationarity, serial correlation is frequently a symptom of a misspecified model (missing lags, trends, or structural breaks), and that misspecification can inflate the apparent goodness of fit.

Under serial correlation, OLS is no longer BLUE (Best Linear Unbiased Estimator) — more efficient estimators exist that exploit the error structure.

Pro Tip

If your time series regression shows suspiciously high t-statistics and a very smooth residual plot (residuals staying positive or negative for extended stretches), serial correlation is a likely culprit. Always test before trusting your results.

Before You Correct — Check Your Model

Serial correlation is often a diagnostic signal that your model is misspecified — it may be missing important lags, a time trend, or seasonal controls. Before applying HAC standard errors or FGLS, consider whether adding omitted variables or restructuring the model eliminates the serial correlation. Sometimes the right fix is better model specification, not a statistical correction. First differencing — subtracting each variable’s previous-period value — is a separate approach used when unit roots are suspected, addressing non-stationarity rather than AR(1) error dependence.

Testing for Serial Correlation

The Durbin-Watson Test

The Durbin-Watson (DW) test is the classic diagnostic for first-order serial correlation. It uses the OLS residuals to compute a test statistic that indicates whether consecutive errors are correlated:

Durbin-Watson Statistic
DW = Σ(ût − ût−1)2 / Σût2
Sum of squared differences between consecutive residuals, divided by the sum of squared residuals. The DW statistic ranges from 0 to 4.

The DW statistic has a simple relationship to the estimated serial correlation coefficient:

DW–Rho Relationship
DW ≈ 2(1 − ρ̂)
When ρ̂ = 0 (no serial correlation), DW ≈ 2. When ρ̂ approaches 1 (strong positive serial correlation), DW approaches 0.
DW Value Implied ρ̂ Interpretation
≈ 2.0 ≈ 0 No serial correlation
< 2.0 > 0 Positive serial correlation (most common)
> 2.0 < 0 Negative serial correlation
≈ 0 ≈ 1 Strong positive serial correlation
≈ 4.0 ≈ −1 Strong negative serial correlation

The DW test uses critical value bounds (dL and dU) that depend on the sample size and number of regressors. For testing positive serial correlation: if DW < dL, reject the null; if DW > dU, fail to reject; values between dL and dU are inconclusive. For negative serial correlation, apply the same bounds to (4 − DW). The inconclusive region is a notable disadvantage — the Breusch-Godfrey test avoids this problem entirely.

Example: Treasury Yield Regression

A researcher regresses monthly changes in the 10-Year Treasury yield on the Federal Funds rate and CPI inflation (2000–2024, T = 300 months). The OLS regression produces a DW statistic of 0.87.

Using the DW–rho relationship: ρ̂ ≈ 1 − DW/2 = 1 − 0.87/2 = 0.565. With critical values dL = 1.72 and dU = 1.76 at the 5% level, DW = 0.87 is well below dL — strong evidence of positive serial correlation. The OLS standard errors from this regression cannot be trusted.

The Breusch-Godfrey LM Test

The Breusch-Godfrey (BG) test is more general than the Durbin-Watson test and is the preferred diagnostic in modern applied work. Its key advantages:

  • Valid when the model includes lagged dependent variables (where DW is biased)
  • Can test for higher-order serial correlation (e.g., AR(2), AR(4) for quarterly seasonality)
  • Produces a clear reject/fail-to-reject decision (no inconclusive region)

The procedure is straightforward: regress the OLS residuals on the original regressors plus q lagged residuals, then compute the LM statistic as LM = n × R2aux (where n is the number of observations in the auxiliary regression), which follows a χ2(q) distribution under the null hypothesis of no serial correlation up to order q.

t-Test for AR(1) Residuals

The simplest serial correlation test regresses the OLS residuals on their own first lag: ût = α + ρût−1 + error. The t-statistic on ρ̂ tests H0: ρ = 0. This simple version is valid when all regressors are strictly exogenous. When the model includes lagged dependent variables or other non-strictly-exogenous regressors, you must include all original regressors in the auxiliary regression alongside ût−1 to obtain a valid test — this is Wooldridge’s general version (Eq. 12.24), which is equivalent to the Breusch-Godfrey test with q = 1.

Serial Correlation-Robust Inference: Newey-West Standard Errors

Rather than correcting for serial correlation (which requires assuming a specific error structure), you can compute standard errors that are asymptotically valid regardless of the error structure. Newey-West standard errors — also called HAC (Heteroskedasticity and Autocorrelation Consistent) standard errors — achieve this by adjusting the variance estimate to account for error dependence.

Newey-West (HAC) Variance Estimator
ν̂ = σ̂2 + 2Σh=1g[1 − h/(g + 1)] × ĉh
The standard OLS variance is augmented with weighted autocovariance terms (ĉh) up to bandwidth g, using linearly declining Bartlett weights that ensure the estimate is non-negative

The bandwidth (g) determines how many lagged autocovariances to include. In practice, bandwidth values are typically small — Wooldridge notes that values of g = 1 or 2 are common for annual data, with moderately larger values for quarterly or monthly data. Most statistical software packages (Stata, R, EViews) compute a default bandwidth automatically based on the sample size. It is good practice to check sensitivity by trying a few different values — if your conclusions change substantially with different bandwidths, the results may not be robust.

Newey-West standard errors are robust to both serial correlation and heteroskedasticity — the time series analog of White’s robust standard errors used in cross-sectional analysis. They use the original OLS coefficient estimates, so no re-estimation is required.

HAC Correction in Practice

Returning to the Treasury yield regression (T = 300, DW = 0.87), the OLS t-statistic on the Federal Funds rate coefficient is 4.81 — highly significant. After computing Newey-West standard errors with bandwidth g = 2, the t-statistic drops to 2.14.

The coefficient is still statistically significant at the 5% level, but far less extreme than OLS suggested. Without the HAC correction, a researcher might have overstated the precision of this estimate by more than a factor of two.

Correcting for Serial Correlation: FGLS

When you are confident that the errors follow an AR(1) process and all regressors are strictly exogenous, you can go beyond adjusting standard errors and actually transform the data to eliminate the serial correlation. This approach — called Feasible Generalized Least Squares (FGLS) — can produce more efficient estimates than OLS. FGLS is not valid when the model contains lagged dependent variables or other non-strictly-exogenous regressors — in those cases, use Newey-West (HAC) standard errors instead.

The core idea is quasi-differencing: subtracting ρ times the lagged value from each variable removes the serial correlation from the errors:

Quasi-Differenced Equation
t = yt − ρyt−1,   x̃t = xt − ρxt−1
Subtracting ρ times the lagged value from each variable produces transformed data with serially uncorrelated errors

Since ρ is unknown, FGLS uses the estimated ρ̂ from the residual regression (Step 1 of the t-test above). The procedure:

  1. Run OLS on the original model and obtain residuals ût
  2. Regress ût on ût−1 to estimate ρ̂
  3. Quasi-difference all variables using ρ̂
  4. Run OLS on the transformed data to obtain FGLS estimates

Two standard implementations exist:

Method First Observation Key Feature
Cochrane-Orcutt Dropped Simpler; iterates until ρ̂ converges
Prais-Winsten Retained (weighted by √(1 − ρ̂2)) More efficient in small samples; preserves information
FGLS Example: Corporate Bond Spreads

A quarterly regression of BBB corporate bond spreads on GDP growth and the VIX index (2005–2024, T = 80) produces ρ̂ = 0.73, indicating strong positive serial correlation. Credit spreads are slow-moving and persistent — a textbook case for FGLS correction.

After applying Prais-Winsten estimation, the standard error on the VIX coefficient increases from 0.041 (OLS) to 0.068 (FGLS), and the coefficient on GDP growth loses statistical significance at the 5% level — a result masked by the artificially small OLS standard errors.

HAC Standard Errors vs FGLS

When serial correlation is detected, researchers face a choice between two correction strategies. The right choice depends on how confident you are in the error structure:

HAC Standard Errors (Newey-West)

  • Does not require specifying the error process
  • Asymptotically valid under any form of serial correlation (and heteroskedasticity)
  • Changes standard errors only — uses original OLS coefficient estimates, no re-estimation
  • Less efficient than FGLS when AR(1) is correctly specified
  • Current best practice in published finance research
  • Best for: robustness when the error structure is unknown

FGLS (Cochrane-Orcutt / Prais-Winsten)

  • Requires specifying the error process (typically AR(1)) and strictly exogenous regressors
  • More efficient than OLS+HAC when the AR(1) model is correct
  • Produces different coefficient estimates (not just different SEs)
  • Can be biased if the error process is misspecified
  • R-squared not directly comparable to OLS (different dependent variable)
  • Best for: efficiency when you are confident in the AR(1) structure

In current applied finance research, HAC standard errors are the default recommendation. They sacrifice some efficiency for robustness — a trade-off most researchers are willing to make, given the difficulty of verifying the exact error structure. FGLS remains valuable when the AR(1) model is well-supported and efficiency is a priority (e.g., small samples where every degree of precision matters).

ARCH/GARCH: A Related but Distinct Phenomenon

Serial correlation refers to dependence in the level of the errors — today’s error predicts tomorrow’s error. ARCH (Autoregressive Conditional Heteroskedasticity) and GARCH models address a different problem: the conditional variance of the error changes over time (volatility clustering). A bond return series might exhibit both — serially correlated errors AND time-varying volatility. When ARCH effects are present, standard FGLS (which assumes constant variance in the transformed model) may be insufficient. For modeling time-varying volatility in financial returns, see our GARCH Volatility Calculator. For unit root testing and cointegration analysis, which addresses non-stationarity rather than error dependence, see our dedicated guide.

Common Mistakes

1. Using the Durbin-Watson test with lagged dependent variables. The DW test is biased toward 2 (toward finding no serial correlation) when the regression includes lagged values of the dependent variable as regressors. This means the test has low power precisely when serial correlation is most dangerous. Use the Breusch-Godfrey LM test instead, which remains valid with lagged dependent variables.

2. Confusing serial correlation with trending data. A trending time series can produce residuals that appear serially correlated even when the true errors are independent. If you regress a corporate earnings series on a macroeconomic variable without including a time trend, the residuals will cluster in runs simply because both variables trend over time. The solution is proper model specification — include time trends or detrend the data before testing for serial correlation.

3. Applying HAC or FGLS without first checking model specification. Serial correlation is often a symptom of a misspecified model — missing lags, omitted trends, or absent seasonal controls. Jumping straight to HAC standard errors or FGLS treats the symptom without addressing the cause. Always check whether adding omitted variables or restructuring the model eliminates the serial correlation before resorting to statistical corrections.

4. Assuming serial correlation biases OLS coefficient estimates. This is a common misconception. Under standard assumptions (strictly exogenous regressors), serial correlation does not bias OLS estimates — they remain unbiased and consistent. The problem is entirely with inference: standard errors, t-statistics, and confidence intervals are unreliable. You can trust the point estimates; you just need to fix the standard errors (or the model).

Frequently Asked Questions

They are the same concept — the terms are used interchangeably in econometrics. Both refer to the correlation between a variable (or an error term) and its own lagged values. Wooldridge and many econometrics textbooks prefer “serial correlation,” while time series and statistics textbooks often use “autocorrelation.” In practice, you will encounter both terms in academic papers and software output (e.g., Stata reports “Durbin-Watson d-statistic” while R’s acf() function computes the “autocorrelation function”).

A DW statistic of 1.0 implies ρ̂ ≈ 0.5, indicating moderate positive serial correlation. Using the approximation DW ≈ 2(1 − ρ̂), a DW of 1.0 means roughly 50% of each period’s error carries over to the next period. This would typically lead to rejection of the null hypothesis of no serial correlation, meaning the OLS standard errors from this regression are unreliable and should be corrected using HAC standard errors or FGLS.

In most applied work, the Breusch-Godfrey (BG) test is preferred. It is valid when the model includes lagged dependent variables (where the Durbin-Watson test is biased toward 2), it can detect higher-order serial correlation (AR(2), AR(4)), and it produces a clear reject/fail-to-reject decision with no inconclusive region. The Durbin-Watson test is simpler and still widely reported, but its critical values depend on the specific regressor matrix, it only tests for first-order serial correlation, and it has an inconclusive zone. Use DW as a quick visual check (is it near 2?) and BG as your formal diagnostic.

ARCH (Autoregressive Conditional Heteroskedasticity) and GARCH (Generalized ARCH) model time-varying volatility — the tendency for large price swings to cluster together. This is distinct from serial correlation, which refers to dependence in the level of the errors. A financial time series can exhibit both: serially correlated errors (today’s error predicts tomorrow’s direction) and ARCH effects (today’s large error predicts tomorrow’s error will also be large in magnitude, regardless of direction). Serial correlation affects the mean equation; ARCH/GARCH affects the variance equation. For volatility modeling, try our GARCH Volatility Calculator.

Not necessarily. If your data shows no evidence of serial correlation (DW near 2, Breusch-Godfrey test insignificant), standard OLS inference is valid and more statistically efficient. Newey-West standard errors are a safeguard when serial correlation is present or suspected. Some researchers use them by default in time series work as a conservative practice — similar to using heteroskedasticity-robust standard errors in cross-sectional work — but this comes at the cost of slightly wider confidence intervals and reduced statistical power when serial correlation is absent.

With positive serial correlation, consecutive residuals tend to have the same sign — they cluster in runs above and below zero. This means the data contains less independent information than OLS assumes. OLS calculates standard errors as if it has T fully independent observations, but the effective sample size is smaller because nearby observations carry overlapping information. The resulting standard errors underestimate the true sampling variability of the coefficient estimates, leading to inflated t-statistics and an increased risk of Type I error (rejecting a true null hypothesis).

Disclaimer

This article is for educational and informational purposes only and does not constitute investment or financial advice. The numerical examples and test statistics presented are illustrative and based on stylized scenarios. Serial correlation diagnostics and corrections should be applied with careful consideration of the specific data and model context. Always consult relevant econometrics references and qualified professionals for research-grade analysis.