Causal Inference in Econometrics: RDD, Propensity Scores & Treatment Effects

Every empirical question in finance ultimately asks: does X cause Y? Does adopting new disclosure standards reduce a firm’s cost of capital? Does analyst coverage improve stock liquidity? Does a regulatory change increase compliance costs? Answering these questions requires more than running a regression and observing a statistically significant coefficient. The coefficient is causal only if the underlying identification strategy is credible — that is, only if the source of variation in the treatment is plausibly exogenous.

This article provides a unified framework for causal inference in econometrics. For each method, we focus on three questions: (1) what counterfactual is missing, (2) what identifying assumption fills the gap, and (3) which population the resulting estimate applies to. We cover the potential outcomes framework, randomized experiments, regression adjustment, propensity score methods, and regression discontinuity design in depth, then provide hub overviews of instrumental variables and difference-in-differences with links to their dedicated articles.

What Is Causal Inference in Econometrics? The Potential Outcomes Framework

The modern framework for causal inference is the Rubin causal model (also called the potential outcomes framework). For each unit i, we define two potential outcomes: Yi(1) is the outcome if unit i receives treatment, and Yi(0) is the outcome if it does not. The individual treatment effect is the difference Yi(1) − Yi(0).

The Fundamental Problem of Causal Inference

We can never observe both Yi(1) and Yi(0) for the same unit at the same time (Holland, 1986). A firm either adopts a new reporting standard or it does not — we observe one potential outcome, never both. Every causal inference method is a strategy for constructing a credible estimate of the missing counterfactual.

Because individual effects are unobservable, we target population averages:

Average Treatment Effect (ATE)
τATE = E[Y(1) − Y(0)]
The expected treatment effect averaged over the entire population — both units that would and would not receive treatment.
Average Treatment Effect on the Treated (ATT)
τATT = E[Y(1) − Y(0) | W = 1]
The expected treatment effect for the subpopulation that actually receives treatment. ATT can be identified under weaker conditions than ATE because it only requires constructing the counterfactual for treated units.

Where:

  • τATE — the average treatment effect across the entire population
  • τATT — the average treatment effect on the treated subpopulation
  • Y(1) — the potential outcome if the unit receives treatment
  • Y(0) — the potential outcome if the unit does not receive treatment
  • W — the treatment indicator (W = 1 if treated, W = 0 if not)
  • E[·] — the expectation operator, averaging over the relevant population

When treatment is not randomly assigned, a simple comparison of group means does not recover the ATE or ATT. This is because of selection bias:

Selection Bias Decomposition
E[Y | W = 1] − E[Y | W = 0] = τATT + {E[Y(0) | W = 1] − E[Y(0) | W = 0]}
The observed difference in group means equals the ATT plus a selection bias term. Selection bias arises when treated and untreated units differ systematically in their baseline potential outcomes — that is, they would have had different outcomes even without treatment.

Where:

  • E[Y | W = 1] — the observed mean outcome for treated units
  • E[Y | W = 0] — the observed mean outcome for untreated units
  • τATT — the true average treatment effect on the treated
  • E[Y(0) | W = 1] − E[Y(0) | W = 0] — the selection bias term: the difference in baseline potential outcomes between groups, reflecting how treated and untreated units would have differed even without treatment

This framework also requires the Stable Unit Treatment Value Assumption (SUTVA): (1) no interference between units — one firm’s treatment does not affect another firm’s outcome, and (2) no hidden versions of treatment — there is a single, well-defined treatment. When SUTVA fails (for example, if a regulation changes the competitive landscape for all firms), treatment effects are not well-defined without additional structural assumptions.

Selection Bias Example: Voluntary ESG Reporting

Suppose 200 firms voluntarily adopt ESG reporting (W = 1) and 800 do not (W = 0). The average ROA among adopters is 8.3%, versus 6.2% among non-adopters — a naive difference of 2.1 percentage points.

But firms that adopt ESG reporting tend to be larger, more profitable, and better-governed. These firms would likely have had higher ROA even without ESG reporting — meaning E[Y(0) | W = 1] > E[Y(0) | W = 0]. The 2.1pp gap overstates the true ATT because it includes selection bias from observable and potentially unobservable differences between adopters and non-adopters.

Randomized Controlled Trials as the Gold Standard

A randomized controlled trial (RCT) eliminates selection bias by randomly assigning units to treatment and control groups. Under random assignment, the treatment indicator W is independent of all potential outcomes:

Random Assignment

W ⊥ {Y(0), Y(1)}. When treatment is randomly assigned, treated and control groups are identical in expectation on all characteristics — both observed and unobserved. The selection bias term vanishes, and the simple difference in group means is an unbiased estimator of the ATE.

In finance, true RCTs are rare because researchers cannot randomly assign regulations, capital structures, exchange listings, or corporate governance rules. Ethical and legal constraints further limit experimentation. However, some finance applications involve genuine randomization:

Finance RCT: A/B Testing of Loan Pricing

A fintech lender randomly assigns 5,000 loan applicants to two groups: one receives the standard origination fee (2.0%) and the other receives a reduced fee (1.5%). After 12 months, the lender compares default rates between groups. Because assignment is random, any difference in default rates is causally attributable to the fee reduction — no confounders, no selection bias.

Pro Tip

True RCTs are the exception in corporate finance and regulatory research. The quasi-experimental methods below — regression adjustment, propensity score methods, RDD, instrumental variables, and difference-in-differences — provide frameworks for credible causal inference when randomization is not feasible.

Regression Adjustment

When an RCT is unavailable, regression adjustment attempts to identify causal effects by controlling for all variables that jointly determine treatment assignment and outcomes. This approach relies on two key assumptions:

Conditional Independence (Unconfoundedness)
{Y(0), Y(1)} ⊥ W | X
Treatment assignment is independent of potential outcomes, conditional on observed covariates X. After controlling for X, any remaining variation in W is as good as random. This is Wooldridge’s assumption ATE.1.
Overlap (Common Support)
0 < P(W = 1 | X) < 1   for all X
For every combination of covariates, there must be a positive probability of being in either the treatment or control group. Without overlap, we cannot construct the counterfactual for some units. This is Wooldridge’s assumption ATE.2.

Where:

  • {Y(0), Y(1)} — the pair of potential outcomes under no treatment and treatment
  • W — the treatment indicator
  • X — the vector of observed pre-treatment covariates (firm size, leverage, industry, etc.)
  • — statistical independence (conditional on X in the CIA formula)
  • P(W = 1 | X) — the propensity score: probability of receiving treatment given covariates

Under these assumptions, the ATE can be estimated consistently by specifying conditional mean functions for the treated and control groups and averaging the predicted treatment effect over the covariate distribution.

Regression Adjustment Example: IFRS Adoption and Cost of Equity

In a dual-reporting regime, some firms voluntarily adopt IFRS while others remain on local GAAP. A researcher estimates the effect of IFRS adoption on cost of equity capital by regressing cost of equity on an IFRS indicator plus controls for firm size (log assets), leverage, analyst coverage, and industry fixed effects. Sample: 300 firms. Estimated ATE: IFRS adoption reduces cost of equity by 45 basis points.

The causal interpretation hinges entirely on whether the controls capture all factors that drive both the adoption decision and cost of equity. If they do, the estimate is unbiased. If not, omitted variable bias contaminates the result.

When Regression Adjustment Fails

The CIA is untestable — you can never be certain that all confounders are observed. If unobservable factors like management quality, disclosure philosophy, or governance culture drive both IFRS adoption and cost of equity, regression adjustment produces biased estimates regardless of how many controls are included. When the CIA is not credible, consider methods that exploit exogenous variation: instrumental variables, difference-in-differences, or regression discontinuity.

Propensity Score Methods

Propensity score methods offer an alternative to regression adjustment that is particularly useful when the number of covariates is large or when treated and control groups have very different covariate distributions. All propensity score methods still require the CIA — they reorganize how it is applied, but they do not relax it.

Propensity Score
p(X) = P(W = 1 | X)
The probability of receiving treatment given observed covariates, typically estimated via logit or probit regression. Under the CIA, conditioning on p(X) alone is sufficient to remove selection bias from observables (Rosenbaum and Rubin, 1983).

Where:

  • p(X) — the propensity score for a unit with covariates X
  • W — the treatment indicator
  • X — the vector of observed pre-treatment covariates used to predict treatment assignment

Propensity score matching (PSM) pairs each treated unit with one or more control units that have similar propensity scores. The ATT is estimated as the average difference in outcomes between matched treated and control units. Inverse probability weighting (IPW) takes a different approach — rather than matching, it weights each observation to create a pseudo-population in which treatment is independent of covariates:

IPW Weights (for ATE)
Treated: 1 / p(X)   |   Control: 1 / (1 − p(X))
These weights estimate the ATE. For the ATT, use weight = 1 for treated units and p(X) / (1 − p(X)) for control units.

Where:

  • p(X) — the estimated propensity score for a unit with covariates X
  • 1 / p(X) — the weight applied to treated units to estimate the ATE
  • 1 / (1 − p(X)) — the weight applied to control units to estimate the ATE
Doubly Robust Estimation (IPWRA)

Doubly robust estimators combine IPW with regression adjustment. The resulting estimator is consistent if either the propensity score model or the outcome regression model is correctly specified — both need not be correct simultaneously. This provides a valuable safeguard against model misspecification.

Diagnostics are essential for propensity score methods. Check common support by examining the distribution of propensity scores in treated and control groups — regions with no overlap must be trimmed. Inspect covariate balance after matching or weighting to confirm that the procedure has successfully equalized observed characteristics. Trim or winsorize extreme weights to prevent a few observations from dominating the IPW estimate.

PSM Example: ESG Reporting and ROA (Continued)

Returning to the ESG reporting example: a logit model estimates each firm’s propensity to adopt ESG reporting based on firm size, leverage, governance score, and industry. The 200 adopters are matched 1:1 to non-adopters with similar propensity scores.

Result: ATT = +0.8% ROA, compared to the naive difference of +2.1%. Matching removed 62% of the apparent effect by accounting for observable selection. Whether the remaining 0.8% reflects a true causal effect depends on whether unobservable confounders have also been addressed — PSM does not guarantee this.

Regression Discontinuity Design

Regression discontinuity design (RDD) exploits situations where treatment is assigned based on whether a continuous variable (the running variable) crosses a known cutoff. Near the cutoff, units just above and just below are nearly identical in all respects except treatment status — creating a quasi-experimental comparison.

Sharp RDD
Wi = 1[Xi ≥ c]
Treatment is a deterministic function of the running variable X crossing cutoff c. Every unit above the cutoff is treated; every unit below is not. The treatment effect is identified by comparing outcomes just above and just below the threshold.

Where:

  • Wi — the treatment indicator for unit i
  • Xi — the running variable (forcing variable) for unit i
  • c — the known cutoff value that determines treatment assignment
  • 1[·] — the indicator function (equals 1 if the condition is true, 0 otherwise)
Fuzzy RDD
τc = ( limx↓c E[Y | X = x] − limx↑c E[Y | X = x] ) / ( limx↓c E[W | X = x] − limx↑c E[W | X = x] )
When crossing the cutoff changes the probability of treatment but does not determine it with certainty, the ratio of the outcome discontinuity to the treatment discontinuity identifies a LATE at the cutoff — the effect for compliers at the threshold. This is equivalent to an IV estimator using 1[X ≥ c] as the instrument for W.

Where:

  • τc — the treatment effect estimated at the cutoff
  • limx↓c — the limit as x approaches the cutoff from above (just above the threshold)
  • limx↑c — the limit as x approaches the cutoff from below (just below the threshold)
  • Y — the outcome variable
  • W — the treatment indicator
  • X — the running variable (forcing variable)

Bandwidth selection is critical: the researcher must choose how close to the cutoff observations must be to enter the analysis. A narrower bandwidth increases internal validity (units are more similar) but reduces sample size and precision. Local linear regression fits separate trends on each side of the cutoff.

A key diagnostic for RDD is the continuity (no-manipulation) condition: the density of the running variable must be smooth through the cutoff (McCrary test). If units can precisely manipulate their running variable to sort above or below the cutoff, the quasi-experimental logic breaks down. Covariate continuity at the cutoff should also be verified — if pre-treatment characteristics jump at the threshold, something other than the treatment is changing.

RDD Example: SEC Accelerated Filer Threshold

Under SEC rules, firms meeting certain conditions must file as accelerated filers if their public float exceeds $75 million, triggering enhanced disclosure and internal control requirements. Restricting the sample to firms that already satisfy the reporting-history and revenue conditions, the $75M public float threshold provides a sharp RDD.

Running variable: public float. Cutoff: $75M. Bandwidth: $10M window ($65M to $85M). Comparing firms just above and just below the threshold, the RDD estimate suggests accelerated disclosure reduces bid-ask spreads by approximately 4.2 basis points at the cutoff.

Note: This example is stylized to illustrate RDD mechanics. In practice, accelerated filer status depends on additional conditions beyond public float alone.

Pro Tip

RDD identifies a local treatment effect at the cutoff only. The disclosure effect estimated for firms near $75M in public float may not generalize to firms at $500M or $10M. Always be explicit about the population to which an RDD estimate applies.

Instrumental Variables and Difference-in-Differences: Hub Overview

Two of the most widely used causal inference methods in finance — instrumental variables (IV) and difference-in-differences (DiD) — have dedicated articles in this series. Here we provide a brief conceptual overview and the key links.

Instrumental Variables

IV addresses endogeneity when the CIA fails — that is, when unobservable confounders make regression adjustment unreliable. An instrument Z must satisfy four conditions: (1) relevance (Z is correlated with the treatment W), (2) independence (Z is as good as randomly assigned, conditional on covariates), (3) the exclusion restriction (Z affects the outcome Y only through its effect on W, not directly), and (4) monotonicity (the instrument shifts everyone weakly in the same direction). Under these conditions, IV identifies the Local Average Treatment Effect (LATE) — the causal effect for compliers, the subpopulation whose treatment status is changed by the instrument. LATE is generally not equal to ATE or ATT. For example, using brokerage closures as an instrument for analyst coverage estimates the effect of losing coverage for firms that would have retained it absent the closure — not the effect for all firms. For the full IV/2SLS treatment including first-stage diagnostics, weak instruments, and overidentification tests, see Instrumental Variables & 2SLS.

Difference-in-Differences

DiD compares changes in outcomes over time between a group that receives treatment and a group that does not. The key assumption is parallel trends: absent treatment, both groups would have followed the same trajectory. DiD identifies the ATT for the treated group in the specific policy context. For example, comparing audit costs for accelerated filers (affected by SOX) versus non-accelerated filers (less affected) before and after 2004 estimates the ATT of SOX compliance. For the full DiD treatment including regression form, staggered adoption, Callaway-Sant’Anna estimators, and the SOX worked example, see Difference-in-Differences. For panel fixed effects as a related causal tool, see Panel Data Analysis.

Causal Inference Methods in Econometrics: When to Use Each

The table below summarizes the key features of each causal inference method. The right choice depends on the source of variation available in your data and the assumptions you can credibly defend.

Method Key Assumption Estimand Target Population Finance Application
RCT Random assignment ATE Full population A/B testing of loan terms
Regression Adjustment CIA (selection on observables) ATE Full population (if overlap holds) IFRS adoption effect on cost of equity
PSM / IPW CIA + overlap ATT (matching) or ATE (IPW) Matched/reweighted population ESG reporting effect on ROA
DiD Parallel trends ATT Treated group in policy context SOX compliance costs
IV / 2SLS Relevance + independence + exclusion restriction + monotonicity LATE Compliers only Analyst coverage effect on liquidity
RDD Continuity at cutoff + no manipulation Local effect at cutoff Units near the threshold SEC filer threshold disclosure effect

No single method dominates. RCTs provide the strongest internal validity but are rarely feasible in finance. Regression adjustment and propensity scores are widely applicable but require the strong and untestable CIA. DiD and IV exploit specific institutional features — policy changes, natural experiments, regulatory thresholds — that provide exogenous variation without requiring selection on observables. RDD offers highly credible local estimates but cannot be extrapolated.

The choice of method depends on the institutional setting. Evaluating whether fiscal stimulus shifts aggregate demand typically requires DiD or IV with macro-level instruments. Assessing whether a new risk regulation reduces expected shortfall might use RDD if the regulation applies at a clear threshold, or DiD if it was adopted at a specific date. The best empirical studies clearly state their identification strategy, defend its assumptions, and acknowledge the limitations of their chosen method.

Common Mistakes

1. Claiming causality without a credible identification strategy. Adding control variables to an OLS regression does not establish causation. Without specifying the source of exogenous variation — randomization, an instrument, a cutoff, a natural experiment — a regression coefficient captures correlation plus omitted variable bias, not a causal effect. Always state explicitly which identification strategy you are using and defend its core assumptions.

2. Confusing propensity score matching with causal identification. Matching on observables reduces covariate imbalance, but it does not eliminate bias from unobservable confounders. PSM requires the CIA just as regression adjustment does. If unobserved factors drive both treatment and outcome, matching produces biased estimates — it is a method for improving balance, not a substitute for a valid identification assumption.

3. Controlling for post-treatment variables (bad controls). Including variables that are themselves affected by the treatment biases the estimated causal effect. For example, if studying the effect of a new listing requirement on firm value, controlling for trading volume (which is also affected by the listing requirement) absorbs part of the causal channel and distorts the estimate. Only control for pre-treatment covariates.

4. Applying RDD estimates far from the cutoff. RDD identifies a local treatment effect at the threshold. Extrapolating to units well above or below the cutoff requires strong functional form assumptions that are typically unjustifiable. A disclosure effect estimated for firms near $75M public float may not apply to firms at $500M or $10M.

5. Ignoring external validity. Every quasi-experimental method estimates a specific parameter for a specific population. LATE from IV applies to compliers only. RDD applies at the cutoff only. DiD applies to the treated group in the specific policy context. Researchers should state clearly which population the estimate covers and avoid generalizing beyond the study’s identification.

6. Treating SUTVA as automatically satisfied. SUTVA requires both no interference between units and no hidden versions of treatment. In finance, spillovers are common: SOX compliance by large firms may change auditing market dynamics for all firms; a central bank rate change affects all banks simultaneously; an exchange listing rule change alters competitive dynamics across the market. When SUTVA fails, treatment effects are not well-defined without additional structural assumptions.

Frequently Asked Questions

Causal inference is the process of determining whether a change in one variable actually causes a change in another, rather than merely being correlated with it. Econometrics provides a toolkit of methods — including randomized controlled trials, regression adjustment, propensity score matching, difference-in-differences, instrumental variables, and regression discontinuity design — each with different assumptions for isolating causal effects from observational data. The key challenge is constructing a credible counterfactual: what would have happened to treated units had they not been treated?

Selection bias arises when the units that receive treatment differ systematically from those that do not, even in the absence of treatment. For example, firms that voluntarily adopt ESG reporting may already be more profitable, so the observed correlation between ESG reporting and higher ROA partly reflects these pre-existing differences rather than a causal effect of reporting. Correlation does not imply causation because the observed association may be driven by confounding variables, reverse causality, or selection into treatment. Causal inference methods are designed to isolate the treatment effect by addressing these sources of bias.

The Average Treatment Effect (ATE) measures the expected effect of treatment across the entire population, including units that would never actually receive treatment. The Average Treatment Effect on the Treated (ATT) measures the effect only for those who actually receive treatment. The two differ whenever treatment effects are heterogeneous and selection into treatment is non-random. For example, the ATT of IFRS adoption on cost of equity reflects the effect for firms that chose to adopt; the ATE would reflect what would happen if all firms adopted, including those that chose not to. Policy analysis often focuses on the ATT (effect for those affected by the policy), while ATE is relevant for universal interventions.

The Local Average Treatment Effect (LATE), introduced by Imbens and Angrist (1994), is the causal effect for compliers — units whose treatment status is changed by the instrument. In an IV analysis, LATE is the only estimand that can be identified without stronger assumptions about effect homogeneity. LATE differs from ATE (effect for everyone) and ATT (effect for the treated) because it applies only to the complier subpopulation, which is generally unobservable. For example, using brokerage closures as an instrument for analyst coverage, the LATE is the effect of losing coverage for firms that lost it because of the closure — not the effect for all firms or all firms that lost coverage.

Both methods require the conditional independence assumption (selection on observables). Propensity score matching is preferred when the functional form of the outcome model is uncertain, when there are many covariates, or when treated and control groups have very different covariate distributions — matching forces common support and makes the comparison transparent. Regression adjustment is simpler and more statistically efficient when the outcome model is well-specified and covariate distributions overlap substantially. Doubly robust estimators (IPWRA) combine both approaches and are consistent if either model is correct, providing a practical safeguard against misspecification.

Regression discontinuity design (RDD) exploits a known cutoff in a continuous assignment variable that determines treatment eligibility. It is applicable whenever a policy or rule assigns treatment based on whether a variable exceeds a threshold — such as SEC filer status based on public float, index inclusion based on market capitalization, or regulatory capital requirements based on bank size. RDD compares outcomes for units just above and just below the cutoff, where differences in the running variable are negligible. The resulting estimate has high internal validity but applies locally at the threshold only — it cannot be extrapolated to units far from the cutoff without additional assumptions.

Prediction models optimize out-of-sample forecasting accuracy and do not require causal interpretations of coefficients — a model can predict stock returns well using correlated signals without any variable being causal. Causal inference requires identifying a credible source of exogenous variation and estimating the effect of changing one variable while holding all else fixed. The distinction matters for policy: a predictive model might show that firms with ESG reports have higher ROA, but only a causal analysis can tell you whether mandating ESG reporting would actually increase ROA. Different goals require different methods.

Disclaimer

This article is for educational and informational purposes only and does not constitute investment or research advice. The numerical examples are illustrative and do not represent actual empirical findings. Causal inference methods require careful consideration of assumptions and institutional context. Always evaluate the credibility of the identification strategy in your specific research setting.