Instrumental Variables & Two-Stage Least Squares: Solving Endogeneity
When a financial researcher estimates the effect of analyst coverage on stock liquidity, or the impact of firm leverage on profitability, Ordinary Least Squares (OLS) can produce misleading results if the explanatory variable is correlated with the error term. This is the endogeneity problem — and it is pervasive in empirical finance. Instrumental variables (IV) estimation and two-stage least squares (2SLS) provide a powerful solution by exploiting an external source of variation that affects the endogenous variable but has no direct effect on the outcome. This guide covers the three sources of endogeneity, what makes a valid instrument, the IV estimator, 2SLS mechanics, weak instruments, testing procedures, and the local average treatment effect (LATE) interpretation.
The Endogeneity Problem
Endogeneity occurs when an explanatory variable in a regression model is correlated with the error term. When this happens, OLS estimates are biased and inconsistent — increasing the sample size does not fix the problem.
A variable X is endogenous when Cov(X, u) ≠ 0, where u is the error term. OLS requires Cov(X, u) = 0 for consistent estimation. When this assumption fails, the OLS coefficient does not have a causal interpretation.
There are three primary sources of endogeneity in financial research:
1. Omitted Variables — A relevant variable is left out of the regression and correlates with both X and Y. For example, estimating the effect of R&D spending on firm profitability while omitting managerial quality. Better managers invest in better projects (affecting R&D) and generate higher profits (affecting ROA), biasing the R&D coefficient. See Omitted Variable Bias for a full treatment of this problem.
2. Measurement Error — The explanatory variable is measured with noise, creating correlation between the measured variable and the error term. For example, using self-reported leverage ratios that contain reporting errors or timing mismatches between book values and market values.
3. Simultaneity — X causes Y and Y causes X at the same time. In commodity markets, price and quantity are determined simultaneously by supply and demand. Regressing quantity on price yields biased estimates because price is itself endogenous — it responds to quantity demanded.
Research question: Does analyst coverage increase stock liquidity (reduce the bid-ask spread)?
Problem: Analysts tend to cover stocks that are already liquid — institutional investors demand research on liquid names, and brokerages allocate analysts to stocks with high trading volume. This creates reverse causality: more liquid stocks attract more analysts, and more analysts may improve liquidity.
Consequence: An OLS regression of liquidity on analyst coverage overstates the true causal effect because coverage is endogenous. The OLS coefficient captures both the causal effect of coverage on liquidity and the selection effect of analysts choosing liquid stocks.
This is exactly the type of problem instrumental variables are designed to solve.
What Is an Instrumental Variable?
An instrumental variable (IV) is an external variable Z that provides exogenous variation in the endogenous explanatory variable X. A valid instrument must satisfy two conditions:
Where:
- Z — the instrumental variable (proposed instrument)
- X — the endogenous explanatory variable
- Cov(Z, X) — the covariance between the instrument and the endogenous variable; must be nonzero for the instrument to have predictive power
Where:
- Z — the instrumental variable
- u — the error term in the structural equation
- Cov(Z, u) — the covariance between the instrument and the error term; must equal zero for the instrument to be valid (the exclusion restriction)
There is a critical asymmetry between these conditions. Relevance is testable — regress X on Z and check whether Z is statistically significant. Exogeneity is not directly testable — it must be justified by economic reasoning and institutional knowledge. This is the exclusion restriction: the instrument affects the outcome only through the endogenous variable, not through any other channel.
Common instrument strategies in finance research include:
- Regulatory changes — Sarbanes-Oxley (SOX) as an instrument for changes in audit quality or corporate governance practices. The regulation exogenously shifts firm behavior.
- Industry-level averages — Industry median leverage as an instrument for an individual firm’s leverage. Firms tend toward industry norms, but the industry average is plausibly unaffected by any single firm’s profitability. Caveat: industry-wide shocks that affect both leverage and profitability can violate the exclusion restriction, so this strategy requires careful justification.
- Geographic distance — Distance from a firm’s headquarters to the nearest financial center as an instrument for institutional ownership or analyst coverage.
- Lagged variables — Lagged trading volume as an instrument for current liquidity. This strategy is valid only when the error term is serially uncorrelated; if serial correlation is present, lagged variables fail the exogeneity condition. Always test for serial correlation before relying on lagged instruments.
A good instrument tells a compelling economic story. Before running any regression, you should be able to explain in plain language why the instrument affects X (relevance) and why it has no direct effect on Y other than through X (the exclusion restriction). If you cannot articulate this story, the instrument is unlikely to be credible — no amount of statistical testing can substitute for economic reasoning.
The IV Estimator
In the simplest case — one endogenous variable X, one instrument Z, and no additional controls — the IV estimator replaces OLS’s reliance on all variation in X with only the variation driven by Z:
Where:
- β̂IV — the instrumental variables estimate of the slope coefficient
- Cov(Z, Y) — the reduced-form covariance between the instrument and the outcome variable
- Cov(Z, X) — the first-stage covariance between the instrument and the endogenous explanatory variable
- Z — the instrumental variable
- Y — the outcome (dependent) variable
- X — the endogenous explanatory variable
Intuition: Start from the structural equation Y = β0 + β1X + u. Take the covariance of both sides with Z. Because Cov(Z, u) = 0 by the exogeneity assumption, we get Cov(Z, Y) = β1 × Cov(Z, X). Solving for β1 yields the IV estimator. The key insight is that IV isolates only the exogenous variation in X — the portion driven by Z — and discards the contaminated variation correlated with the error term.
Where:
- Var(β̂IV) — the asymptotic variance of the IV estimator
- σ² — the variance of the structural error term u
- n — the sample size
- σX² — the variance of the endogenous explanatory variable X
- ρXZ² — the squared correlation between the endogenous variable X and the instrument Z; smaller values mean weaker instruments and larger IV variance
This formula reveals the fundamental cost of IV estimation: the denominator includes ρXZ², the squared correlation between the instrument and the endogenous variable. When this correlation is small (a weak instrument), the variance of the IV estimator becomes very large relative to OLS, resulting in wide confidence intervals and imprecise estimates.
Model: ROAi = β0 + β1 × Leveragei + ui
Problem: Leverage is endogenous — profitable firms may choose lower leverage (reverse causality), and omitted factors like growth opportunities affect both leverage and profitability.
Instrument: Industry median leverage (Z). Firms tend toward industry capital structure norms, but the industry median is plausibly unaffected by any single firm’s profitability.
| Statistic | Value |
|---|---|
| Cov(Z, ROA) | −0.0042 |
| Cov(Z, Leverage) | 0.0156 |
| β̂IV = −0.0042 / 0.0156 | −0.269 |
| OLS estimate (for comparison) | −0.180 |
Interpretation: After correcting for endogeneity, each one-unit increase in leverage is associated with a 0.269 percentage point decrease in ROA. The OLS estimate of −0.180 understates the true negative effect — the endogeneity bias (reverse causality from profitable firms choosing lower leverage) attenuated the OLS coefficient toward zero.
Two-Stage Least Squares (2SLS)
When there are multiple instruments, multiple exogenous regressors, or both, the simple IV estimator generalizes to two-stage least squares (2SLS). As the name suggests, estimation proceeds in two stages.
2SLS separates estimation into two stages. The first stage extracts only the exogenous variation in the endogenous variable by regressing it on all instruments and controls. The second stage uses the fitted values from the first stage — which are purged of endogeneity — to estimate the causal effect on the outcome.
Where:
- Xi — the endogenous explanatory variable for observation i
- π0 — the first-stage intercept
- π1, π2 — coefficients on the excluded instruments Z1 and Z2
- Z1i, Z2i — excluded instruments (variables that affect X but are excluded from the structural equation)
- γ1 — coefficient on the exogenous control variable
- W1i — exogenous control variable (included in both stages)
- vi — first-stage error term
- X̂ — fitted values from the first-stage regression (the predicted exogenous component of X)
Where:
- Yi — the outcome (dependent) variable for observation i
- β0 — the second-stage intercept
- β1 — the 2SLS estimate of the causal effect of X on Y
- X̂i — the fitted value from the first stage, replacing the endogenous X with its predicted exogenous component
- β2 — coefficient on the exogenous control variable
- W1i — exogenous control variable (same as in the first stage)
- ui — structural error term
Critical implementation details:
- The first stage must include all exogenous variables from the structural equation, not just the instruments
- With fewer excluded instruments than endogenous variables (underidentified), the model is not identified — there is insufficient exogenous variation to estimate the parameters
- With exactly one excluded instrument for one endogenous variable (just-identified), 2SLS reduces to the simple IV estimator
- With more excluded instruments than endogenous variables (overidentified), 2SLS efficiently combines all instruments
Never run 2SLS manually as two separate OLS regressions and report the second-stage standard errors. The standard errors from a manual second stage are incorrect because they treat X̂ as observed data rather than an estimate. Always use dedicated 2SLS commands in your statistical software (e.g., ivreg2 in Stata, IV2SLS in Python’s linearmodels, ivreg in R).
Structural equation: Spreadi = β0 + β1 × Coveragei + β2 × MarketCapi + β3 × Volatilityi + ui
Instruments: (1) Brokerage firm closures — an exogenous shock that reduces coverage for affected stocks, and (2) Distance from firm headquarters to the nearest financial center — affects analyst access but not liquidity directly.
First stage: Coveragei = π0 + π1 × BrokerClosurei + π2 × Distancei + π3 × MarketCapi + π4 × Volatilityi + vi
| Result | OLS | 2SLS |
|---|---|---|
| Effect of coverage on spread (β̂1) | −0.051 | −0.032 |
| Standard error | 0.008 | 0.014 |
| First-stage F-statistic | — | 18.4 |
Interpretation: Each additional analyst reduces the bid-ask spread by 3.2 basis points (2SLS), compared to the OLS estimate of 5.1 basis points. The OLS estimate was biased upward in magnitude because analysts preferentially cover already-liquid stocks. The first-stage F of 18.4 exceeds the weak instrument threshold of 10, confirming instrument relevance.
Weak Instruments
An instrument is weak when the correlation between Z and X is small — the instrument barely moves the endogenous variable. Weak instruments are one of the most serious practical problems in IV estimation.
Where:
- First-Stage F-statistic — the F-statistic from a joint significance test of all excluded instruments in the first-stage regression of X on Z and controls
- 10 — the Staiger-Stock (1997) rule-of-thumb threshold; instruments with F below this value are considered weak
The consequences of weak instruments are severe:
- Bias toward OLS — The IV estimator exhibits finite-sample bias toward the OLS estimate, undermining the entire purpose of the correction
- Unreliable standard errors — Confidence intervals have incorrect coverage rates, meaning you cannot trust hypothesis tests
- Distorted t-statistics — You may reject or fail to reject the null hypothesis incorrectly
- Worse than OLS — With many weak instruments, the 2SLS bias can actually exceed the OLS bias
A first-stage F-statistic of 3 or 5 may appear statistically significant, but it is far too low for credible IV inference. The Stock-Yogo (2005) critical values provide more precise thresholds for specific bias tolerances, but F > 10 remains the widely used benchmark in empirical finance.
If your instruments are weak, consider these alternatives: (1) find a stronger instrument with a more direct economic link to the endogenous variable, (2) use the Limited Information Maximum Likelihood (LIML) estimator — an alternative to 2SLS that is less biased with weak instruments, or (3) report the Anderson-Rubin confidence set — a weak-IV-robust inference procedure that provides valid hypothesis tests and confidence intervals regardless of instrument strength. LIML and Anderson-Rubin serve different purposes: LIML is a point estimator, while Anderson-Rubin provides robust inference.
Testing for Endogeneity and Overidentification
The Durbin-Wu-Hausman (DWH) Test for Endogeneity
Before committing to IV estimation, you should test whether endogeneity is actually present. If X is exogenous, OLS is preferred because it is more efficient. The Durbin-Wu-Hausman (DWH) test — distinct from the Hausman specification test used for fixed vs. random effects in panel data — compares OLS and IV estimates to detect endogeneity.
H1: Cov(X, u) ≠ 0 (X is endogenous — use IV)
Where:
- H0 — the null hypothesis that X is exogenous (Cov(X, u) = 0)
- H1 — the alternative hypothesis that X is endogenous (Cov(X, u) ≠ 0)
- X — the potentially endogenous explanatory variable
- u — the structural error term
- Cov(X, u) — the covariance between the explanatory variable and the error term; nonzero indicates endogeneity
The practical implementation uses the variable addition test:
- Estimate the first-stage regression of X on Z and all controls. Save the residuals v̂.
- Add v̂ as an additional regressor in the structural equation and estimate by OLS.
- Test whether the coefficient on v̂ is statistically significant.
- If significant: X is endogenous — use IV. If not significant: X is exogenous — use OLS.
The Overidentification Test (Sargan / Hansen J-Test)
When the model is overidentified (more excluded instruments than endogenous variables), the overidentification test checks whether all instruments satisfy the exogeneity condition. This test is not available in the just-identified case.
Where:
- J — the Sargan-Hansen test statistic for overidentifying restrictions
- n — the sample size
- R² — the R-squared from regressing the 2SLS residuals on all instruments and exogenous controls
- χ²(q − k) — the chi-squared distribution with q − k degrees of freedom
- q — the number of excluded instruments
- k — the number of endogenous regressors
- q − k — the number of overidentifying restrictions (excess instruments beyond what is needed for identification)
If the J-test rejects, at least one instrument fails the exogeneity condition — it is correlated with the error term and should be removed or replaced. If the J-test does not reject, the overidentifying restrictions are consistent with instrument validity (though the test has limited power).
| Test | Null Hypothesis | Rejection Means | When to Use |
|---|---|---|---|
| DWH Test | X is exogenous | Use IV instead of OLS | Always — before committing to IV |
| First-Stage F | Instruments are irrelevant | Instruments are relevant | Always — check instrument strength |
| Sargan-Hansen J | All instruments are valid | At least one instrument is invalid | Only when overidentified (q > k) |
The Local Average Treatment Effect (LATE)
When IV is applied in a treatment-effect framework — estimating the causal impact of a binary or discrete treatment — an important subtlety arises. IV does not necessarily estimate the average treatment effect (ATE) for the entire population. When the treatment effect varies across individuals or firms, IV estimates the local average treatment effect (LATE) — the causal effect for the specific subset whose behavior is changed by the instrument.
The LATE Interpretation (Angrist-Imbens Framework): When IV is used in a treatment-effect framework with a binary instrument (e.g., a regulatory change), it estimates the causal effect specifically for compliers — units that change their treatment status because of the instrument. This interpretation requires a monotonicity assumption (no defiers — no unit does the opposite of what the instrument predicts). It does not capture effects for always-takers or never-takers.
| Subgroup | Definition | Finance Example: ESG Disclosure Mandate | Contributes to LATE? |
|---|---|---|---|
| Compliers | Change treatment because of the instrument | Firms that begin ESG reporting only because the regulation requires it | Yes |
| Always-Takers | Would adopt treatment regardless | Firms that already published ESG reports voluntarily | No |
| Never-Takers | Do not adopt even with the instrument | Firms that fail to comply despite the mandate | No |
| Defiers | Do the opposite of what the instrument predicts | Ruled out by the monotonicity assumption — a required condition for LATE identification | No |
The practical implication is that the IV estimate reflects the effect for firms at the margin of compliance — not for firms that would have disclosed regardless. LATE equals ATE only when the treatment effect is homogeneous across all units. In most finance applications, effects are heterogeneous, so researchers should be explicit about which population the IV estimate represents. For a broader overview of causal inference methods including difference-in-differences and other approaches, see Causal Inference in Econometrics.
OLS vs. IV Estimation
The choice between OLS and IV involves a fundamental trade-off between efficiency and consistency:
OLS Estimation
- Consistency: Consistent only if Cov(X, u) = 0
- Efficiency: Most efficient when assumptions hold (BLUE)
- Precision: Smaller standard errors
- Bias: Biased and inconsistent if endogeneity present
- Interpretation: Consistent for population effect only under exogeneity
- Best for: When X is plausibly exogenous
IV / 2SLS Estimation
- Consistency: Consistent even when Cov(X, u) ≠ 0
- Efficiency: Less efficient than OLS (larger variance)
- Precision: Larger standard errors — the cost of using IV
- Bias: Corrects endogeneity bias (if instruments valid)
- Interpretation: In treatment-effect settings, estimates LATE (compliers); in linear models, a consistent causal effect
- Best for: When X is endogenous and valid instruments exist
The bottom line: IV sacrifices precision for consistency. If X is truly exogenous, OLS is strictly preferred — it gives you the same answer with tighter confidence intervals. If endogeneity is present, IV provides a consistent estimate at the cost of wider confidence intervals. The DWH test helps determine which estimator to use. In finance, simultaneity is common in many settings — for example, monetary policy and asset prices jointly determine each other, making IV estimation essential for credible causal inference.
Limitations of Instrumental Variables
While IV estimation solves the endogeneity problem, it comes with important limitations that researchers should acknowledge:
Finding a valid instrument is the hardest part of IV estimation. Credible instruments must satisfy two conditions simultaneously — relevance and exogeneity — and the exogeneity condition cannot be directly tested. Many published IV studies have been criticized for using instruments with questionable exclusion restrictions.
1. Valid instruments are hard to find — The exclusion restriction requires that the instrument affects the outcome only through the endogenous variable. In practice, most candidate instruments have plausible direct effects on the outcome, making them invalid. This is a fundamental constraint, not a technical one.
2. Less precise than OLS — IV estimates always have larger standard errors than OLS, sometimes substantially so. This means wider confidence intervals and lower statistical power. When the instruments are weak, this imprecision can render the results uninformative.
3. LATE may differ from ATE — In treatment-effect settings, IV estimates the local average treatment effect for compliers, not the average effect for the entire population. If the complier subpopulation is small or unrepresentative, the LATE may not generalize to the policy-relevant population.
4. Exogeneity cannot be directly tested — The Sargan-Hansen J-test can detect some violations when the model is overidentified, but it has limited power and cannot detect violations that affect all instruments equally. In the just-identified case, there is no test at all — the exclusion restriction relies entirely on economic reasoning.
5. Sensitive to instrument choice — Different instruments can produce materially different IV estimates, especially when treatment effects are heterogeneous (since each instrument identifies a different LATE). Researchers should be transparent about how instrument choice affects results.
Common Mistakes
Instrumental variables estimation is powerful but frequently misapplied. Here are the most common errors in practice:
1. Using instruments that violate the exclusion restriction — The most critical error. An instrument that directly affects Y (not just through X) produces inconsistent IV estimates that may be worse than OLS. For example, using firm size as an instrument for analyst coverage — but firm size also directly affects stock liquidity through trading volume. Always justify the exclusion restriction with economic reasoning before running any regressions.
2. Ignoring the first-stage F-statistic — Running 2SLS without verifying instrument strength. If the first-stage F is below 10, the IV estimates may be more biased than OLS, with unreliable confidence intervals. Always report the first-stage F-statistic and interpret it explicitly.
3. Assuming IV eliminates all bias — IV corrects for the specific endogeneity addressed by the instrument, but other sources of bias (additional omitted variables, measurement error in other regressors) may remain. IV is not a silver bullet — it addresses one source of endogeneity at a time.
4. Using manual second-stage standard errors — Running two separate OLS regressions and reporting the standard errors from the second regression. These standard errors are too small because they treat the first-stage fitted values as known rather than estimated. Always use dedicated 2SLS software commands that compute corrected standard errors automatically.
5. Including too many instruments — With a large number of instruments relative to the sample size, the first stage can overfit, causing 2SLS estimates to exhibit finite-sample bias toward OLS. This is a bias problem distinct from instrument validity — even if all instruments are truly exogenous, having too many of them degrades the quality of inference. Keep the number of excluded instruments modest relative to the sample size. Separately, use the overidentification test (Sargan/Hansen J) to verify that excluded instruments satisfy the exogeneity condition.
6. Treating any correlated variable as a valid instrument — A variable that is correlated with the endogenous regressor is not automatically a valid instrument. Relevance is necessary but not sufficient — the exclusion restriction must also hold. For example, a firm’s total assets may predict analyst coverage (relevance), but total assets also directly affect stock liquidity (violating exogeneity). Always articulate the economic reasoning for the exclusion restriction before relying on an instrument.
Frequently Asked Questions
Disclaimer
This article is for educational and informational purposes only and does not constitute financial or investment advice. Formulas, examples, and test statistics are presented for illustrative purposes using hypothetical data. Always consult the original textbook references and qualified professionals for implementation in research or practice.