The Linear Probability Model: When the Dependent Variable Is Binary

The linear probability model is the simplest econometric approach to modeling a binary outcome. When a credit analyst needs to estimate the probability that a borrower will default, the most direct method is to apply ordinary least squares (OLS) to a dependent variable coded as 0 (no default) or 1 (default). The result is a model whose fitted values can be read as predicted probabilities and whose coefficients have an unusually clean interpretation. The LPM has real advantages — and well-known limitations that every practitioner should understand. For the nonlinear alternatives that address those limitations, see our guide on logit and probit models.

What Is the Linear Probability Model?

The linear probability model applies OLS regression to a binary (0/1) dependent variable. Because Y can only be 0 or 1, the conditional expectation E(Y | X) equals the probability that Y = 1 given X. The LPM models this probability as a linear function of the regressors.

Key Concept

In the linear probability model, E(Y | X) = P(Y = 1 | X) is modeled as a linear function of the independent variables. Each regression coefficient represents the percentage-point change in the probability of Y = 1 for a one-unit increase in the corresponding regressor, holding all else constant.

Linear Probability Model
E(Y | X) = P(Y = 1 | X) = β0 + β1X1 + β2X2 + … + βkXk
The conditional probability of Y = 1 is a linear function of the regressors

Where:

  • E(Y | X) = P(Y = 1 | X) — the probability that the outcome equals 1, given the values of X
  • Y — binary dependent variable (0 or 1)
  • β0 — intercept
  • β1, …, βk — slope coefficients, each measuring the effect of a one-unit change in the corresponding X on P(Y = 1)
  • X1, …, Xk — independent variables (regressors)

The LPM is a special case of multiple regression where the dependent variable happens to be binary. Note that having a binary dependent variable is distinct from using dummy variables as independent regressors — in the LPM, it is the outcome itself that takes only two values.

Interpreting Linear Probability Model Coefficients

LPM coefficients have a direct, intuitive interpretation that makes the model attractive for applied work.

Continuous regressors. The coefficient βj represents the change in P(Y = 1) for a one-unit increase in Xj, holding all other variables constant. This is a percentage-point change, not a percent change. For example, if β = −0.0008 on credit score in a loan default model, a 1-point increase in credit score reduces default probability by 0.08 percentage points. A 100-point improvement (say, from 650 to 750) reduces default probability by 8.0 percentage points.

Binary regressors. When Xj is a 0/1 dummy variable, the coefficient represents the ceteris paribus difference in predicted probability between the two groups. For instance, a “secured loan” dummy with coefficient −0.05 means secured loans have a predicted default probability 5 percentage points lower than unsecured loans, holding borrower characteristics constant.

Constant marginal effects. The LPM imposes the same absolute effect regardless of the baseline probability. A 100-point credit score improvement has the same 8.0-percentage-point effect whether the borrower starts at 5% or 50% default risk. This linearity is a simplification — near the boundaries of 0 and 1, marginal effects should logically compress toward zero. For models with diminishing marginal effects near the boundaries, see logit and probit models.

Pro Tip

LPM coefficients are marginal effects by construction — no additional computation needed. This is one of the model’s main practical advantages over logit and probit, where marginal effects must be calculated separately at a chosen evaluation point or averaged across the sample.

Advantages of the Linear Probability Model

Despite its limitations, the LPM remains widely used in applied econometrics for several good reasons:

  • Direct coefficient interpretation. Coefficients are marginal effects. There is no need to compute average marginal effects (AME) or marginal effects at the mean (MEM) as with logit or probit. What you estimate is what you interpret.
  • Computational simplicity. OLS has a closed-form solution — no iterative maximum likelihood estimation is required. R² remains valid, and joint hypothesis tests can be conducted using heteroskedasticity-robust Wald statistics available in all standard software.
  • Works reasonably well in the interior. When predicted probabilities cluster roughly between 0.2 and 0.8 (a rough heuristic, not a hard rule), LPM marginal effects often approximate logit and probit average partial effects closely.
  • Compatible with causal inference designs. The LPM is standard in difference-in-differences and instrumental variable settings because adding a nonlinear link function can complicate the identification argument without meaningfully changing the estimated effects.

It is worth noting that the main concerns with the LPM are functional form and inference — not coefficient bias. Under correct exogeneity assumptions (Wooldridge’s MLR.1–MLR.4), LPM coefficients remain unbiased for the linear projection, regardless of the functional form limitations described below.

Limitations of the Linear Probability Model

The LPM has four well-known problems, all stemming from applying a linear function to a probability that is inherently bounded between 0 and 1.

Important Limitation

If a substantial share of your sample has predicted probabilities outside [0, 1], the linear specification is a poor fit for the data. Always report the fraction of out-of-range predictions as a diagnostic when using the LPM.

1. Predicted probabilities outside [0, 1]. Nothing constrains the linear function to produce values between 0 and 1. With extreme regressor values — a very high credit score combined with low debt — the model can predict −0.05 or 1.12. These have no meaning as probabilities.

2. Inherent heteroskedasticity. Because the dependent variable is binary, the conditional variance of Y given X takes a specific form that depends on the regressors:

Heteroskedasticity in the LPM
Var(Y | X) = p(X)[1 − p(X)]
The error variance depends on X whenever any slope coefficient is nonzero

Because p(X) varies with X by construction, the error variance is non-constant — the model is heteroskedastic. This means standard OLS standard errors are invalid for inference. For the general treatment of heteroskedasticity and its consequences, see our guide on heteroskedasticity.

3. Non-normal errors. The error term in the LPM can take only two values: (1 − p(X)) when Y = 1 and −p(X) when Y = 0. This is clearly not normally distributed. However, with large samples, the central limit theorem ensures asymptotic normality of the OLS estimator, so this is primarily a small-sample concern.

4. Constant marginal effects. The LPM forces each regressor to have the same absolute effect on the probability regardless of the baseline. A credit score improvement should logically have a smaller effect on default probability for a borrower already at 2% risk than for one at 40% risk. The LPM cannot capture this diminishing-returns pattern.

When to Use the LPM vs. Logit or Probit

The choice between the LPM and nonlinear alternatives depends on the research goal (causal effect estimation vs. probability prediction), the distribution of predicted probabilities, and interpretive needs.

Linear Probability Model

  • Link function: None (identity)
  • Coefficients: Direct marginal effects (constant)
  • Predictions: Can fall below 0 or above 1
  • Estimation: OLS (closed-form)
  • Best when: Causal inference (DiD, IV), probabilities in interior, exploratory analysis
  • Weakness: Unreliable at extreme probabilities

Logit (Logistic Regression)

  • Link function: Logistic CDF
  • Coefficients: Log-odds; marginal effects computed separately
  • Predictions: Always in (0, 1)
  • Estimation: MLE (iterative)
  • Best when: Probability prediction matters, odds-ratio interpretation valued, calibrated scores needed
  • Weakness: Marginal effects depend on evaluation point

Probit

  • Link function: Normal CDF
  • Coefficients: Index units; marginal effects computed separately
  • Predictions: Always in (0, 1)
  • Estimation: MLE (iterative)
  • Best when: Latent-variable theoretical framework, academic convention in some fields
  • Weakness: Same as logit; no odds-ratio interpretation

In practice, all three models often produce similar average partial effects when predicted probabilities cluster in the interior (roughly 0.2 to 0.8 — a heuristic, not a hard rule). The choice matters most when modeling outcomes near the probability boundaries. Important: raw logit and probit coefficients are not directly comparable to LPM coefficients. To compare results across models, compute marginal effects. For full coverage of logit and probit estimation, see our guide on logit and probit models.

Robust Standard Errors for the Linear Probability Model

Because the LPM is inherently heteroskedastic — Var(Y | X) = p(X)[1 − p(X)] depends on X by construction — standard OLS standard errors are invalid. The t-statistics, p-values, and confidence intervals produced by default OLS output cannot be trusted.

The solution: heteroskedasticity-robust (White) standard errors. These adjust the variance-covariance matrix using observation-specific squared residuals rather than a common σ². Robust standard errors are the best-practice default for LPM inference because heteroskedasticity is guaranteed with binary outcomes. Their justification is asymptotic — they rely on large-sample properties, so they are most reliable with moderate to large sample sizes.

WLS as a theoretical alternative. In principle, one could apply weighted least squares by weighting each observation by 1/√[p̂(1 − p̂)], where p̂ is the fitted probability. In practice, when any fitted probability falls outside [0, 1], the quantity p̂(1 − p̂) becomes nonpositive, making the implied WLS weights invalid. This practical complication makes WLS unreliable for many LPM applications.

Pro Tip

Heteroskedasticity-robust standard errors are the standard choice for LPM inference. They are valid whether or not heteroskedasticity takes the exact Bernoulli form, and with binary outcomes, heteroskedasticity is always present. For the full treatment of robust standard errors, Breusch-Pagan testing, and HC variants, see our guide on heteroskedasticity.

LPM Example: Loan Default Prediction

To see how the linear probability model works in practice, consider a bank that estimates the probability of loan default using data from 12,500 consumer loans.

Loan Default LPM: Estimated Results

The bank models P(Default = 1 | X) as a linear function of credit score, debt-to-income ratio, and loan amount. All standard errors are heteroskedasticity-robust.

Variable Coefficient Robust SE t-Statistic
Intercept 0.5480 0.042 13.05
Credit Score (per point) −0.0008 0.00006 −13.33
DTI Ratio (%) 0.0045 0.0009 5.00
Loan Amount (per $10,000) 0.0032 0.0010 3.20

Coefficient interpretation:

  • Credit Score: Each 1-point increase reduces predicted default probability by 0.08 percentage points. A 100-point improvement (e.g., 650 → 750) reduces default probability by 8.0 percentage points.
  • DTI Ratio: Each 1-percentage-point increase in debt-to-income ratio raises default probability by 0.45 percentage points.
  • Loan Amount: Each additional $10,000 in loan principal increases default probability by 0.32 percentage points.

Predicted Probabilities for Three Borrower Profiles

Conservative Homebuyer (Credit Score = 720, DTI = 30%, Loan = $250,000):

P̂ = 0.5480 + (−0.0008)(720) + (0.0045)(30) + (0.0032)(25)
P̂ = 0.5480 − 0.5760 + 0.1350 + 0.0800 = 0.187 (18.7%)

Within the [0, 1] range — the LPM produces a sensible prediction for this moderate-risk profile.

Prime Borrower (Credit Score = 780, DTI = 18%, Loan = $150,000):

P̂ = 0.5480 + (−0.0008)(780) + (0.0045)(18) + (0.0032)(15)
P̂ = 0.5480 − 0.6240 + 0.0810 + 0.0480 = 0.053 (5.3%)

Still in range, but approaching the lower boundary where logit or probit would be more reliable.

Excellent-Credit Borrower (Credit Score = 800, DTI = 12%, Loan = $100,000):

P̂ = 0.5480 + (−0.0008)(800) + (0.0045)(12) + (0.0032)(10)
P̂ = 0.5480 − 0.6400 + 0.0540 + 0.0320 = −0.006 (−0.6%)

Negative — meaningless as a probability. This illustrates the LPM’s boundary problem for very low-risk borrowers. A logit model would produce a small positive probability instead.

Corporate Bond Default: LPM in Fixed Income

A fixed-income research team applies the LPM to 8,200 investment-grade and high-yield corporate bonds issued between 2010 and 2023, modeling the probability of default within five years. Their estimated model (with robust standard errors) finds:

  • Interest coverage ratio: coefficient = −0.012 — each additional unit of coverage reduces five-year default probability by 1.2 percentage points
  • Leverage ratio (Debt/Assets): coefficient = 0.18 — a 10-percentage-point increase in leverage raises default probability by 1.8 percentage points
  • High-yield dummy (1 = below BBB−): coefficient = 0.09 — high-yield issuers have a predicted default probability 9 percentage points higher than investment-grade issuers, holding financials constant

For a BBB-rated industrial firm with an interest coverage ratio of 5.0 and leverage of 0.45, the model predicts a five-year default probability of approximately 4.2%. The constant-marginal-effect limitation matters here: the same model would assign an implausibly negative default probability to a AAA-rated firm with very low leverage, reinforcing the case for logit or probit when modeling high-quality credits.

For the broader credit risk framework — including probability of default within regulatory and Basel contexts — see our guide on credit risk and probability of default.

Common Mistakes

These are the most frequent errors practitioners make when working with the linear probability model.

1. Using standard OLS standard errors. Heteroskedasticity is built into any LPM with nonzero slope coefficients. Reporting standard OLS standard errors produces invalid t-statistics and p-values. Always use heteroskedasticity-robust standard errors.

2. Ignoring out-of-range predicted probabilities. When a nontrivial share of observations have fitted values below 0 or above 1, the linear specification is poorly matched to the data. Always report the fraction of out-of-range predictions. If the share is substantial, consider logit or probit.

3. Using the LPM when bounded probabilities or calibration quality is the objective. The LPM estimates marginal effects well, but it is generally not the best choice when the goal is to produce well-calibrated probability scores for individual observations. Logit and probit produce predictions that are bounded in (0, 1) and are typically better calibrated, especially in the tails.

4. Assuming constant marginal effects are always a problem. For many causal-inference questions — such as whether a regulatory change affects corporate bond default rates — the average partial effect is the quantity of interest. The LPM’s constant-effect assumption approximates this well in many settings. The linearity becomes most problematic when modeling outcomes near the 0 or 1 boundaries.

5. Confusing percentage points with percent changes. An LPM coefficient of 0.0045 means a 0.45 percentage-point increase in default probability, not a 0.45% increase. At a 10% baseline default rate, 0.45 percentage points represents a 4.5% relative increase — these are very different numbers. Always specify percentage points when reporting LPM results.

Frequently Asked Questions

A linear probability model is ordinary least squares (OLS) regression applied to a binary (0/1) outcome variable. The fitted values are interpreted as predicted probabilities: they estimate how likely the outcome is to equal 1 given the values of the independent variables. Each coefficient tells you the change in that probability — measured in percentage points — for a one-unit increase in the corresponding predictor, holding all else constant. The LPM is the simplest model for binary outcomes, but it can produce predicted probabilities outside the [0, 1] range for extreme observations.

Because the dependent variable is binary, the conditional variance of the error term equals the conditional variance of Y itself: Var(u | X) = Var(Y | X) = p(X)[1 − p(X)], which depends on X by construction. This means the LPM is always heteroskedastic whenever any slope coefficient is nonzero. The usual OLS standard errors assume constant error variance (homoskedasticity), so they are invalid for the LPM. Heteroskedasticity-robust (White) standard errors correct for this problem and should always be reported. For a full treatment of heteroskedasticity and robust inference, see our guide on heteroskedasticity.

Only the standard errors. Under correct exogeneity assumptions (Wooldridge’s MLR.1–MLR.4), OLS coefficients remain unbiased and consistent regardless of whether the errors are heteroskedastic. The problem is purely one of inference: t-statistics, p-values, and confidence intervals computed with standard (non-robust) standard errors are unreliable. The coefficient estimates themselves — the point estimates of the marginal effects — are not affected by heteroskedasticity.

Yes — this is one of the LPM’s well-known limitations. Because the model is a linear function of the regressors, nothing constrains the fitted values to the [0, 1] interval. For borrowers with very strong credit profiles (high credit score, low debt-to-income ratio), the LPM can produce negative predicted “probabilities.” Similarly, very high-risk profiles can yield predictions above 1. When this occurs for a substantial share of the sample, it signals that a nonlinear model — logit or probit — would be more appropriate for the data.

In the interior of the probability distribution — roughly when predicted probabilities fall between 0.2 and 0.8 — LPM coefficients and logit (or probit) average marginal effects are typically close in magnitude. The models diverge most at extreme probabilities near 0 or 1, where the logit and probit marginal effects compress toward zero (reflecting their S-shaped curves), while the LPM imposes the same constant effect everywhere. For most applied research questions focused on average partial effects, all three models tell a similar story. The differences become meaningful when the application requires accurate probability predictions near the boundaries.

Disclaimer

This article is for educational and informational purposes only and does not constitute financial or investment advice. The numerical examples are illustrative and do not represent actual lending data. Content is based on Wooldridge, Jeffrey M., Introductory Econometrics: A Modern Approach, 8th Edition, Cengage, 2025. Always consult qualified professionals for specific financial decisions.