Causal Inference in Econometrics: RDD, Propensity Scores & Treatment Effects
Every empirical question in finance ultimately asks: does X cause Y? Does adopting new disclosure standards reduce a firm’s cost of capital? Does analyst coverage improve stock liquidity? Does a regulatory change increase compliance costs? Answering these questions requires more than running a regression and observing a statistically significant coefficient. The coefficient is causal only if the underlying identification strategy is credible — that is, only if the source of variation in the treatment is plausibly exogenous.
This article provides a unified framework for causal inference in econometrics. For each method, we focus on three questions: (1) what counterfactual is missing, (2) what identifying assumption fills the gap, and (3) which population the resulting estimate applies to. We cover the potential outcomes framework, randomized experiments, regression adjustment, propensity score methods, and regression discontinuity design in depth, then provide hub overviews of instrumental variables and difference-in-differences with links to their dedicated articles.
What Is Causal Inference in Econometrics? The Potential Outcomes Framework
The modern framework for causal inference is the Rubin causal model (also called the potential outcomes framework). For each unit i, we define two potential outcomes: Yi(1) is the outcome if unit i receives treatment, and Yi(0) is the outcome if it does not. The individual treatment effect is the difference Yi(1) − Yi(0).
We can never observe both Yi(1) and Yi(0) for the same unit at the same time (Holland, 1986). A firm either adopts a new reporting standard or it does not — we observe one potential outcome, never both. Every causal inference method is a strategy for constructing a credible estimate of the missing counterfactual.
Because individual effects are unobservable, we target population averages:
Where:
- τATE — the average treatment effect across the entire population
- τATT — the average treatment effect on the treated subpopulation
- Y(1) — the potential outcome if the unit receives treatment
- Y(0) — the potential outcome if the unit does not receive treatment
- W — the treatment indicator (W = 1 if treated, W = 0 if not)
- E[·] — the expectation operator, averaging over the relevant population
When treatment is not randomly assigned, a simple comparison of group means does not recover the ATE or ATT. This is because of selection bias:
Where:
- E[Y | W = 1] — the observed mean outcome for treated units
- E[Y | W = 0] — the observed mean outcome for untreated units
- τATT — the true average treatment effect on the treated
- E[Y(0) | W = 1] − E[Y(0) | W = 0] — the selection bias term: the difference in baseline potential outcomes between groups, reflecting how treated and untreated units would have differed even without treatment
This framework also requires the Stable Unit Treatment Value Assumption (SUTVA): (1) no interference between units — one firm’s treatment does not affect another firm’s outcome, and (2) no hidden versions of treatment — there is a single, well-defined treatment. When SUTVA fails (for example, if a regulation changes the competitive landscape for all firms), treatment effects are not well-defined without additional structural assumptions.
Suppose 200 firms voluntarily adopt ESG reporting (W = 1) and 800 do not (W = 0). The average ROA among adopters is 8.3%, versus 6.2% among non-adopters — a naive difference of 2.1 percentage points.
But firms that adopt ESG reporting tend to be larger, more profitable, and better-governed. These firms would likely have had higher ROA even without ESG reporting — meaning E[Y(0) | W = 1] > E[Y(0) | W = 0]. The 2.1pp gap overstates the true ATT because it includes selection bias from observable and potentially unobservable differences between adopters and non-adopters.
Randomized Controlled Trials as the Gold Standard
A randomized controlled trial (RCT) eliminates selection bias by randomly assigning units to treatment and control groups. Under random assignment, the treatment indicator W is independent of all potential outcomes:
W ⊥ {Y(0), Y(1)}. When treatment is randomly assigned, treated and control groups are identical in expectation on all characteristics — both observed and unobserved. The selection bias term vanishes, and the simple difference in group means is an unbiased estimator of the ATE.
In finance, true RCTs are rare because researchers cannot randomly assign regulations, capital structures, exchange listings, or corporate governance rules. Ethical and legal constraints further limit experimentation. However, some finance applications involve genuine randomization:
A fintech lender randomly assigns 5,000 loan applicants to two groups: one receives the standard origination fee (2.0%) and the other receives a reduced fee (1.5%). After 12 months, the lender compares default rates between groups. Because assignment is random, any difference in default rates is causally attributable to the fee reduction — no confounders, no selection bias.
True RCTs are the exception in corporate finance and regulatory research. The quasi-experimental methods below — regression adjustment, propensity score methods, RDD, instrumental variables, and difference-in-differences — provide frameworks for credible causal inference when randomization is not feasible.
Regression Adjustment
When an RCT is unavailable, regression adjustment attempts to identify causal effects by controlling for all variables that jointly determine treatment assignment and outcomes. This approach relies on two key assumptions:
Where:
- {Y(0), Y(1)} — the pair of potential outcomes under no treatment and treatment
- W — the treatment indicator
- X — the vector of observed pre-treatment covariates (firm size, leverage, industry, etc.)
- ⊥ — statistical independence (conditional on X in the CIA formula)
- P(W = 1 | X) — the propensity score: probability of receiving treatment given covariates
Under these assumptions, the ATE can be estimated consistently by specifying conditional mean functions for the treated and control groups and averaging the predicted treatment effect over the covariate distribution.
In a dual-reporting regime, some firms voluntarily adopt IFRS while others remain on local GAAP. A researcher estimates the effect of IFRS adoption on cost of equity capital by regressing cost of equity on an IFRS indicator plus controls for firm size (log assets), leverage, analyst coverage, and industry fixed effects. Sample: 300 firms. Estimated ATE: IFRS adoption reduces cost of equity by 45 basis points.
The causal interpretation hinges entirely on whether the controls capture all factors that drive both the adoption decision and cost of equity. If they do, the estimate is unbiased. If not, omitted variable bias contaminates the result.
The CIA is untestable — you can never be certain that all confounders are observed. If unobservable factors like management quality, disclosure philosophy, or governance culture drive both IFRS adoption and cost of equity, regression adjustment produces biased estimates regardless of how many controls are included. When the CIA is not credible, consider methods that exploit exogenous variation: instrumental variables, difference-in-differences, or regression discontinuity.
Propensity Score Methods
Propensity score methods offer an alternative to regression adjustment that is particularly useful when the number of covariates is large or when treated and control groups have very different covariate distributions. All propensity score methods still require the CIA — they reorganize how it is applied, but they do not relax it.
Where:
- p(X) — the propensity score for a unit with covariates X
- W — the treatment indicator
- X — the vector of observed pre-treatment covariates used to predict treatment assignment
Propensity score matching (PSM) pairs each treated unit with one or more control units that have similar propensity scores. The ATT is estimated as the average difference in outcomes between matched treated and control units. Inverse probability weighting (IPW) takes a different approach — rather than matching, it weights each observation to create a pseudo-population in which treatment is independent of covariates:
Where:
- p(X) — the estimated propensity score for a unit with covariates X
- 1 / p(X) — the weight applied to treated units to estimate the ATE
- 1 / (1 − p(X)) — the weight applied to control units to estimate the ATE
Doubly robust estimators combine IPW with regression adjustment. The resulting estimator is consistent if either the propensity score model or the outcome regression model is correctly specified — both need not be correct simultaneously. This provides a valuable safeguard against model misspecification.
Diagnostics are essential for propensity score methods. Check common support by examining the distribution of propensity scores in treated and control groups — regions with no overlap must be trimmed. Inspect covariate balance after matching or weighting to confirm that the procedure has successfully equalized observed characteristics. Trim or winsorize extreme weights to prevent a few observations from dominating the IPW estimate.
Returning to the ESG reporting example: a logit model estimates each firm’s propensity to adopt ESG reporting based on firm size, leverage, governance score, and industry. The 200 adopters are matched 1:1 to non-adopters with similar propensity scores.
Result: ATT = +0.8% ROA, compared to the naive difference of +2.1%. Matching removed 62% of the apparent effect by accounting for observable selection. Whether the remaining 0.8% reflects a true causal effect depends on whether unobservable confounders have also been addressed — PSM does not guarantee this.
Regression Discontinuity Design
Regression discontinuity design (RDD) exploits situations where treatment is assigned based on whether a continuous variable (the running variable) crosses a known cutoff. Near the cutoff, units just above and just below are nearly identical in all respects except treatment status — creating a quasi-experimental comparison.
Where:
- Wi — the treatment indicator for unit i
- Xi — the running variable (forcing variable) for unit i
- c — the known cutoff value that determines treatment assignment
- 1[·] — the indicator function (equals 1 if the condition is true, 0 otherwise)
Where:
- τc — the treatment effect estimated at the cutoff
- limx↓c — the limit as x approaches the cutoff from above (just above the threshold)
- limx↑c — the limit as x approaches the cutoff from below (just below the threshold)
- Y — the outcome variable
- W — the treatment indicator
- X — the running variable (forcing variable)
Bandwidth selection is critical: the researcher must choose how close to the cutoff observations must be to enter the analysis. A narrower bandwidth increases internal validity (units are more similar) but reduces sample size and precision. Local linear regression fits separate trends on each side of the cutoff.
A key diagnostic for RDD is the continuity (no-manipulation) condition: the density of the running variable must be smooth through the cutoff (McCrary test). If units can precisely manipulate their running variable to sort above or below the cutoff, the quasi-experimental logic breaks down. Covariate continuity at the cutoff should also be verified — if pre-treatment characteristics jump at the threshold, something other than the treatment is changing.
Under SEC rules, firms meeting certain conditions must file as accelerated filers if their public float exceeds $75 million, triggering enhanced disclosure and internal control requirements. Restricting the sample to firms that already satisfy the reporting-history and revenue conditions, the $75M public float threshold provides a sharp RDD.
Running variable: public float. Cutoff: $75M. Bandwidth: $10M window ($65M to $85M). Comparing firms just above and just below the threshold, the RDD estimate suggests accelerated disclosure reduces bid-ask spreads by approximately 4.2 basis points at the cutoff.
Note: This example is stylized to illustrate RDD mechanics. In practice, accelerated filer status depends on additional conditions beyond public float alone.
RDD identifies a local treatment effect at the cutoff only. The disclosure effect estimated for firms near $75M in public float may not generalize to firms at $500M or $10M. Always be explicit about the population to which an RDD estimate applies.
Instrumental Variables and Difference-in-Differences: Hub Overview
Two of the most widely used causal inference methods in finance — instrumental variables (IV) and difference-in-differences (DiD) — have dedicated articles in this series. Here we provide a brief conceptual overview and the key links.
Instrumental Variables
IV addresses endogeneity when the CIA fails — that is, when unobservable confounders make regression adjustment unreliable. An instrument Z must satisfy four conditions: (1) relevance (Z is correlated with the treatment W), (2) independence (Z is as good as randomly assigned, conditional on covariates), (3) the exclusion restriction (Z affects the outcome Y only through its effect on W, not directly), and (4) monotonicity (the instrument shifts everyone weakly in the same direction). Under these conditions, IV identifies the Local Average Treatment Effect (LATE) — the causal effect for compliers, the subpopulation whose treatment status is changed by the instrument. LATE is generally not equal to ATE or ATT. For example, using brokerage closures as an instrument for analyst coverage estimates the effect of losing coverage for firms that would have retained it absent the closure — not the effect for all firms. For the full IV/2SLS treatment including first-stage diagnostics, weak instruments, and overidentification tests, see Instrumental Variables & 2SLS.
Difference-in-Differences
DiD compares changes in outcomes over time between a group that receives treatment and a group that does not. The key assumption is parallel trends: absent treatment, both groups would have followed the same trajectory. DiD identifies the ATT for the treated group in the specific policy context. For example, comparing audit costs for accelerated filers (affected by SOX) versus non-accelerated filers (less affected) before and after 2004 estimates the ATT of SOX compliance. For the full DiD treatment including regression form, staggered adoption, Callaway-Sant’Anna estimators, and the SOX worked example, see Difference-in-Differences. For panel fixed effects as a related causal tool, see Panel Data Analysis.
Causal Inference Methods in Econometrics: When to Use Each
The table below summarizes the key features of each causal inference method. The right choice depends on the source of variation available in your data and the assumptions you can credibly defend.
| Method | Key Assumption | Estimand | Target Population | Finance Application |
|---|---|---|---|---|
| RCT | Random assignment | ATE | Full population | A/B testing of loan terms |
| Regression Adjustment | CIA (selection on observables) | ATE | Full population (if overlap holds) | IFRS adoption effect on cost of equity |
| PSM / IPW | CIA + overlap | ATT (matching) or ATE (IPW) | Matched/reweighted population | ESG reporting effect on ROA |
| DiD | Parallel trends | ATT | Treated group in policy context | SOX compliance costs |
| IV / 2SLS | Relevance + independence + exclusion restriction + monotonicity | LATE | Compliers only | Analyst coverage effect on liquidity |
| RDD | Continuity at cutoff + no manipulation | Local effect at cutoff | Units near the threshold | SEC filer threshold disclosure effect |
No single method dominates. RCTs provide the strongest internal validity but are rarely feasible in finance. Regression adjustment and propensity scores are widely applicable but require the strong and untestable CIA. DiD and IV exploit specific institutional features — policy changes, natural experiments, regulatory thresholds — that provide exogenous variation without requiring selection on observables. RDD offers highly credible local estimates but cannot be extrapolated.
The choice of method depends on the institutional setting. Evaluating whether fiscal stimulus shifts aggregate demand typically requires DiD or IV with macro-level instruments. Assessing whether a new risk regulation reduces expected shortfall might use RDD if the regulation applies at a clear threshold, or DiD if it was adopted at a specific date. The best empirical studies clearly state their identification strategy, defend its assumptions, and acknowledge the limitations of their chosen method.
Common Mistakes
1. Claiming causality without a credible identification strategy. Adding control variables to an OLS regression does not establish causation. Without specifying the source of exogenous variation — randomization, an instrument, a cutoff, a natural experiment — a regression coefficient captures correlation plus omitted variable bias, not a causal effect. Always state explicitly which identification strategy you are using and defend its core assumptions.
2. Confusing propensity score matching with causal identification. Matching on observables reduces covariate imbalance, but it does not eliminate bias from unobservable confounders. PSM requires the CIA just as regression adjustment does. If unobserved factors drive both treatment and outcome, matching produces biased estimates — it is a method for improving balance, not a substitute for a valid identification assumption.
3. Controlling for post-treatment variables (bad controls). Including variables that are themselves affected by the treatment biases the estimated causal effect. For example, if studying the effect of a new listing requirement on firm value, controlling for trading volume (which is also affected by the listing requirement) absorbs part of the causal channel and distorts the estimate. Only control for pre-treatment covariates.
4. Applying RDD estimates far from the cutoff. RDD identifies a local treatment effect at the threshold. Extrapolating to units well above or below the cutoff requires strong functional form assumptions that are typically unjustifiable. A disclosure effect estimated for firms near $75M public float may not apply to firms at $500M or $10M.
5. Ignoring external validity. Every quasi-experimental method estimates a specific parameter for a specific population. LATE from IV applies to compliers only. RDD applies at the cutoff only. DiD applies to the treated group in the specific policy context. Researchers should state clearly which population the estimate covers and avoid generalizing beyond the study’s identification.
6. Treating SUTVA as automatically satisfied. SUTVA requires both no interference between units and no hidden versions of treatment. In finance, spillovers are common: SOX compliance by large firms may change auditing market dynamics for all firms; a central bank rate change affects all banks simultaneously; an exchange listing rule change alters competitive dynamics across the market. When SUTVA fails, treatment effects are not well-defined without additional structural assumptions.
Frequently Asked Questions
Disclaimer
This article is for educational and informational purposes only and does not constitute investment or research advice. The numerical examples are illustrative and do not represent actual empirical findings. Causal inference methods require careful consideration of assumptions and institutional context. Always evaluate the credibility of the identification strategy in your specific research setting.