Logit & Probit Models: Binary Dependent Variables in Regression

When a bank evaluates a loan application, the outcome is binary: default or no default. An analyst might be tempted to use ordinary least squares to predict this 0/1 outcome, but OLS can produce predicted “probabilities” below zero or above one — nonsensical results for a probability model. The logit model solves this problem by mapping the linear predictor through the logistic cumulative distribution function (CDF), constraining all predicted probabilities to the interval (0, 1). The closely related probit model uses the standard normal CDF instead. Both are estimated by maximum likelihood and have become the standard tools for modeling binary dependent variables in finance and economics. This guide covers the logit and probit specifications, maximum likelihood estimation, how to interpret coefficients through marginal effects and odds ratios, and when to choose logit versus probit versus the linear probability model.

The Problem with OLS for Binary Outcomes

When the dependent variable takes only the values 0 and 1, applying OLS produces the linear probability model (LPM). The LPM interprets the fitted values as probabilities: P̂(Y = 1 | X) = β̂0 + β̂1X1 + … + β̂kXk. While simple and easy to interpret, the LPM has three fundamental limitations for binary outcomes:

  • Predicted probabilities outside [0, 1]. Nothing prevents the linear function from producing fitted values of −0.15 or 1.30 — values that have no meaning as probabilities. In credit scoring with extreme borrower profiles, this occurs routinely.
  • Inherent heteroskedasticity. Because Var(Y | X) = P(X)[1 − P(X)], the error variance depends on X by construction whenever the slope coefficients are nonzero. Standard errors require robust corrections at minimum.
  • Constant marginal effects are unrealistic. The LPM assumes a 1-unit increase in credit score has the same effect on default probability whether the borrower starts at 5% or 50% risk. In reality, marginal effects should diminish near the boundaries.
The LPM Is Not Always Wrong

For observations with predicted probabilities between roughly 0.2 and 0.8, the LPM often agrees closely with logit and probit. Its simplicity makes it useful for quick analysis and difference-in-differences designs. The limitations become severe only when predicted probabilities approach 0 or 1. For a full treatment, see our guide on the linear probability model.

The Logit Model (Logistic Regression)

Logit Model Definition

The logit model (logistic regression) models the probability of a binary outcome by passing the linear index β0 + β1X1 + … + βkXk through the logistic CDF. This ensures that all predicted probabilities lie strictly between 0 and 1, regardless of the values of the independent variables.

Logit Model
P(Y = 1 | X) = exp(β0 + β1X1 + … + βkXk) / [1 + exp(β0 + β1X1 + … + βkXk)]
The logistic CDF maps any real-valued linear index to the interval (0, 1), producing an S-shaped probability curve

Where:

  • P(Y = 1 | X) — the predicted probability that the binary outcome equals 1, given the values of the independent variables
  • Y — the binary dependent variable (0 or 1)
  • β0 — the intercept
  • β1 … βk — the slope coefficients for each independent variable
  • X1 … Xk — the independent (explanatory) variables
  • exp(·) — the exponential function

The logistic function is symmetric around zero: when the linear index equals zero, the predicted probability is exactly 0.5. As the index increases toward positive infinity, the probability approaches 1; as it decreases toward negative infinity, the probability approaches 0. This S-shaped curve naturally captures diminishing marginal effects near the boundaries.

The logit model can be equivalently expressed through the log-odds (logit) transformation, which linearizes the relationship:

Log-Odds (Logit Transformation)
log[P / (1 − P)] = β0 + β1X1 + … + βkXk
Each coefficient βj represents the change in the log-odds of Y = 1 for a one-unit increase in Xj

Where:

  • P — the predicted probability P(Y = 1 | X)
  • P / (1 − P) — the odds of Y = 1 (the ratio of the probability of the event to the probability of the non-event)
  • log[P / (1 − P)] — the log-odds (the natural logarithm of the odds), also called the logit

This leads to a powerful interpretation: exponentiating a coefficient gives the odds ratio. If β1 = 0.045, then exp(0.045) = 1.046, meaning a one-unit increase in X1 multiplies the odds of Y = 1 by 1.046 (a 4.6% increase in the odds), holding all other variables constant.

The logit model also has a latent variable interpretation: an unobserved continuous variable y* = β0 + βX + e determines the binary outcome, with Y = 1 when y* > 0. In the logit model, the error term e follows a logistic distribution (mean 0, variance π²/3).

The Probit Model

The probit model replaces the logistic CDF with the standard normal CDF, denoted Φ(·):

Probit Model
P(Y = 1 | X) = Φ(β0 + β1X1 + … + βkXk)
The standard normal CDF Φ(·) also maps the linear index to (0, 1), producing predicted probabilities nearly identical to logit in practice

Where:

  • Φ(·) — the cumulative distribution function (CDF) of the standard normal distribution, which maps any real number to a probability between 0 and 1

In the latent variable framework, the probit model assumes the error follows a standard normal distribution (mean 0, variance 1) rather than a logistic distribution. Because the standard normal has a smaller variance than the logistic distribution (π²/3 ≈ 3.29), probit coefficients are systematically smaller: logit coefficients are approximately 1.6 times probit coefficients. This scaling difference is mechanical — it does not reflect a substantive disagreement between the models.

In practice, logit and probit produce nearly identical predicted probabilities and marginal effects. The logit model has slightly heavier tails, but the difference is rarely meaningful in applied work.

Pro Tip

In applied finance, logit is more common because credit risk analysts use odds ratios in scorecard development and coefficient interpretation, making logit the natural choice in the industry. In academic econometrics, probit appears slightly more often because the normal distribution connects naturally to the rest of the statistical theory. In most applications, results are functionally interchangeable.

Maximum Likelihood Estimation

Because the logit and probit models are nonlinear in the parameters, OLS cannot be used. Instead, both models are estimated by maximum likelihood estimation (MLE), which finds the parameter values that make the observed data most probable.

Maximum Likelihood Estimation

MLE chooses the coefficient vector β that maximizes the likelihood of observing the actual sample outcomes (the observed pattern of 0s and 1s). For binary data with a Bernoulli distribution, the log-likelihood has a clean, well-behaved form that can be maximized numerically.

For each observation i, the contribution to the likelihood is P(Xi)yi × [1 − P(Xi)](1 − yi). Taking the logarithm and summing across all n observations gives the log-likelihood:

Log-Likelihood Function
ln L = Σi=1n [yi ln P(Xi) + (1 − yi) ln(1 − P(Xi))]
MLE maximizes this function iteratively (typically via Newton-Raphson) to find the parameter values most consistent with the observed data

Where:

  • ln L — the log-likelihood (the natural logarithm of the total likelihood of the observed data)
  • yi — the observed binary outcome for observation i (0 or 1)
  • P(Xi) — the predicted probability of Y = 1 for observation i, given its covariate values
  • n — the number of observations in the sample

There is no closed-form solution — the computer uses iterative numerical methods to find the maximum. Under standard regularity conditions, the MLE is consistent, asymptotically normal, and asymptotically efficient. Standard errors are computed from the inverse of the Hessian matrix (the matrix of second derivatives of the log-likelihood).

For testing multiple restrictions simultaneously (the logit/probit analog of the F-test in OLS), the Likelihood Ratio (LR) test compares the log-likelihoods of the restricted and unrestricted models: LR = −2(ln Lr − ln Lur) ~ χ²q, where q is the number of restrictions. For the OLS version of multiple hypothesis testing, see our guide on hypothesis testing in regression.

Model fit is assessed with McFadden’s pseudo R-squared: 1 − (ln Lfull / ln Lnull), where the null model includes only an intercept. As a rough heuristic, pseudo R-squared values of 0.2 to 0.4 are often seen in well-fitting binary models, though the appropriate range depends on the application and data. The percent correctly predicted — the share of observations where the predicted probability correctly classifies Y at a 0.5 threshold — provides an intuitive but crude alternative. However, this metric depends heavily on the chosen cutoff and can be misleading with imbalanced samples: in a loan portfolio where only 5% of borrowers default, a model that predicts “no default” for everyone achieves 95% accuracy while being completely uninformative.

Interpreting Logit and Probit Coefficients

Coefficients Are Not Marginal Effects

A logit coefficient of −0.008 does not mean the probability decreases by 0.008. In logit and probit models, the marginal effect of any variable depends on the values of all regressors through the nonlinear link function. You must compute marginal effects explicitly — the raw coefficients only indicate the direction and relative magnitude of effects.

The partial effect (marginal effect) of a continuous variable xj on the predicted probability is:

Marginal Effect of xj
∂P / ∂xj = g(β0 + βX) · βj
The marginal effect equals the density function times the coefficient, and varies with the values of all regressors.

Where:

  • ∂P / ∂xj — the marginal effect of variable xj on the predicted probability
  • g(·) — the density function of the link distribution: the logistic density g(z) = P(1 − P) for logit, or the standard normal density φ(z) for probit
  • β0 + βX — the linear index (the linear combination of all regressors and their coefficients)
  • βj — the estimated coefficient on variable xj

The scale factor g(z) is always positive, so the marginal effect has the same sign as βj. Because the marginal effect varies with X, two standard summary measures are used:

  • Marginal Effect at the Mean (MEM): Evaluate the partial effect at the sample mean values of all regressors. Simple to compute, but the “average borrower” may not be representative of any actual borrower.
  • Average Marginal Effect (AME): Compute the partial effect for every observation in the sample, then average. This uses the full distribution of X values and is generally preferred by applied researchers.

For logit, the maximum scale factor is 0.25 (when the predicted probability equals 0.5), so the largest possible marginal effect for any variable is 0.25 × βj. For probit, the maximum scale factor is approximately 0.40 (at z = 0).

Binary Regressors: Use Discrete Changes, Not Derivatives

The derivative formula above applies only to continuous regressors. For binary (indicator) variables — such as “has collateral” (0 or 1) — compute the discrete change in predicted probability: ΔP = P(Y = 1 | d = 1, X) − P(Y = 1 | d = 0, X), holding all other variables at the same values. This gives the exact effect of switching the indicator from 0 to 1, rather than an approximation based on the derivative.

Loan Default Prediction: Logit Model Example

A bank models the probability of loan default (Y = 1 if default, 0 otherwise) using credit score and debt-to-income ratio (DTI) as predictors. The estimated logit model produces:

Variable Coefficient Odds Ratio
Intercept 3.200
Credit Score −0.008 0.992
DTI (%) 0.045 1.046
Has Collateral (0/1) −0.620 0.538

Borrower A: Credit Score = 700, DTI = 35%

  • Linear index: Xβ = 3.200 + (−0.008)(700) + (0.045)(35) = 3.200 − 5.600 + 1.575 = −0.825
  • Predicted probability: P = exp(−0.825) / [1 + exp(−0.825)] = 0.4385 / 1.4385 = 0.305 (30.5%)
  • Marginal effect of credit score: 0.305 × (1 − 0.305) × (−0.008) = −0.0017 — a 1-point increase in credit score reduces default probability by 0.17 percentage points
  • Odds ratio for DTI: exp(0.045) = 1.046 — a 1-percentage-point increase in DTI multiplies the odds of default by 1.046 (4.6% higher odds)

Borrower B: Credit Score = 750, DTI = 35% (same DTI, higher credit score)

  • Linear index: Xβ = 3.200 + (−0.008)(750) + (0.045)(35) = 3.200 − 6.000 + 1.575 = −1.225
  • Predicted probability: P = exp(−1.225) / [1 + exp(−1.225)] = 0.2274 / 1.2274 = 0.227 (22.7%)

Discrete change for collateral (binary regressor): For Borrower A’s profile (Score = 700, DTI = 35%), compare collateral = 0 vs. collateral = 1:

  • Without collateral: Xβ = −0.825 + (−0.620)(0) = −0.825 → P = 0.305
  • With collateral: Xβ = −0.825 + (−0.620)(1) = −1.445 → P = exp(−1.445) / [1 + exp(−1.445)] = 0.191 (19.1%)
  • Discrete change: 0.191 − 0.305 = −0.114 — having collateral reduces default probability by 11.4 percentage points

Holding DTI constant at 35%, a 50-point credit score improvement (700 → 750) reduces default probability from 30.5% to 22.7% — a 7.8-percentage-point drop. This illustrates the nonlinear nature of the logit model: the marginal effect of credit score is larger at higher baseline risk levels and diminishes as the predicted probability moves away from 0.5. The collateral example shows how binary regressors require a discrete-change approach rather than the derivative formula.

Logit vs. Probit vs. Linear Probability Model

Logit Model

  • Link function: Logistic CDF
  • Predicted probabilities: Always in (0, 1)
  • Coefficients: Log-odds; convert to odds ratios via exp(β)
  • Estimation: Maximum likelihood
  • Marginal effects: Vary with X; must compute MEM or AME
  • Best for: Large samples, boundary probabilities matter, odds ratio interpretation valued

Probit Model

  • Link function: Standard normal CDF
  • Predicted probabilities: Always in (0, 1)
  • Coefficients: Index units; no direct odds ratio interpretation
  • Estimation: Maximum likelihood
  • Marginal effects: Vary with X; must compute MEM or AME
  • Best for: Academic research, latent variable interpretation, normal-distribution-based theory

Linear Probability Model

  • Link function: None (identity)
  • Predicted probabilities: Can fall below 0 or exceed 1
  • Coefficients: Direct marginal effects (constant)
  • Estimation: OLS
  • Marginal effects: Coefficients ARE marginal effects
  • Best for: Quick analysis, interior probabilities, difference-in-differences designs

In practice, all three models produce similar marginal effects when predicted probabilities are well within the (0.2, 0.8) range. The choice matters most at the tails — when modeling events with very high or very low base rates (such as loan defaults, which typically occur in 1–5% of a portfolio), logit and probit provide more reliable probability estimates than the LPM. When comparing logit and probit results, always compare average marginal effects (AMEs), not raw coefficients — the raw coefficients differ by a factor of approximately 1.6 due to the different scale of the underlying distributions, but the AMEs will be nearly identical.

Applications in Finance

Logit and probit models are widely used across financial applications wherever the outcome is naturally binary:

  • Credit scoring and probability of default. Banks estimate the probability that a borrower will default on a loan using financial ratios, credit history, and macroeconomic variables as predictors. Logit is the dominant model in the industry due to its odds ratio interpretation. For the full probability of default framework including Basel regulatory requirements, see our guide on credit risk and probability of default.
  • Customer churn prediction. Financial services firms model whether a client will close an account (Y = 1) or remain (Y = 0) based on account activity, fee sensitivity, service interactions, and competitor pricing.
  • Merger and acquisition prediction. Researchers model whether a target firm will receive a takeover bid as a function of firm size, Tobin’s Q, leverage, free cash flow, and industry concentration.
  • Buy/sell signal models. Quantitative analysts predict whether a stock will outperform a benchmark over the next period (Y = 1) or not (Y = 0), using fundamental ratios, technical indicators, and momentum variables as inputs.
M&A Prediction: Logit Application

A researcher estimates a logit model for takeover bids among S&P 500 firms (Y = 1 if the firm receives a bid). Selected coefficients:

  • Leverage: β = 0.82, odds ratio = exp(0.82) = 2.27 — each 1-unit increase in leverage more than doubles the odds of receiving a takeover bid
  • Tobin’s Q: β = −0.35, odds ratio = exp(−0.35) = 0.70 — higher-valued firms are less likely targets (30% lower odds per unit of Q)
  • log(Assets): β = −0.18, odds ratio = exp(−0.18) = 0.84 — larger firms are harder to acquire

These odds ratios align with the corporate finance intuition that highly leveraged, undervalued, and smaller firms are more attractive acquisition targets.

Common Mistakes

1. Interpreting logit or probit coefficients as marginal effects. A logit coefficient of −0.008 does not mean the probability drops by 0.008. The marginal effect depends on the values of all regressors through the density function g(Xβ). Always report average marginal effects (AME) or marginal effects at the mean (MEM) alongside raw coefficients to convey the economic magnitude of the relationship.

2. Comparing pseudo R-squared to OLS R-squared. McFadden’s pseudo R-squared typically ranges from 0.2 to 0.4 for well-fitting binary models. A pseudo R-squared of 0.25 does not mean the model “only explains 25% of the variation” — it is a fundamentally different metric measured on a different scale. Comparing pseudo R-squared values across logit models is reasonable; comparing them to OLS R-squared values is not.

3. Not reporting marginal effects. Publishing only raw logit coefficients or odds ratios without marginal effects makes it impossible for readers to assess the economic magnitude of each predictor. Marginal effects translate results into probability-scale changes that can be compared across studies and models.

4. Ignoring finite-sample bias and separation. These are related but distinct problems. Finite-sample bias: MLE is a large-sample estimator, so with fewer than roughly 10 events (Y = 1) per estimated parameter, logit and probit coefficients can be substantially biased and standard errors inflated. A common rule of thumb is at least 10 events per predictor. Separation is a different issue: it occurs when a predictor (or combination of predictors) perfectly predicts the outcome for a subset of observations, causing one or more coefficient estimates to diverge toward ±∞. Separation can arise even in moderately large samples if a category has zero events — for example, if no borrower with collateral ever defaults in the sample.

5. Comparing raw coefficients across logit and probit models. Logit coefficients are approximately 1.6 times probit coefficients because the logistic distribution has a larger variance (π²/3 ≈ 3.29) than the standard normal (variance = 1). Comparing magnitudes directly without adjusting for this scale difference is meaningless. Compare marginal effects instead — they will be nearly identical across the two models.

Frequently Asked Questions

Both models constrain predicted probabilities to the interval (0, 1), but they use different link functions. The logit model uses the logistic CDF, while the probit model uses the standard normal CDF. In practice, they produce nearly identical predicted probabilities and marginal effects. The main practical difference is interpretability: logit coefficients can be exponentiated to produce odds ratios (exp(β)), which are widely used in credit risk and medical research. Probit coefficients have no direct odds ratio interpretation but connect more naturally to latent variable theory. Logit coefficients are approximately 1.6 times larger than probit coefficients due to the different variances of the underlying distributions.

Logit coefficients represent the change in the log-odds of Y = 1 for a one-unit increase in the predictor, holding all other variables constant. They are not marginal effects on the probability scale. To convert to a more interpretable quantity, you can either: (1) exponentiate the coefficient to get the odds ratio — exp(0.045) = 1.046 means a one-unit increase multiplies the odds by 1.046; or (2) compute marginal effects by multiplying the coefficient by the logistic density P(1 − P) evaluated at specific X values (MEM) or averaged across all observations (AME). Use our logit and probit calculator to compute predicted probabilities and marginal effects for your own models.

Use logit or probit when: (1) predicted probabilities near 0 or 1 are important — for example, in credit scoring where many loans have very low default rates; (2) you need guaranteed probabilities within (0, 1) for downstream calculations; or (3) constant marginal effects are implausible. The linear probability model remains acceptable for quick analysis when most predicted probabilities fall in the 0.2–0.8 range, when direct coefficient interpretation is valued, or in difference-in-differences and instrumental variable designs where the LPM’s simplicity is advantageous.

McFadden’s pseudo R-squared compares the log-likelihood of the fitted model to that of an intercept-only (null) model: pseudo R² = 1 − (ln Lfull / ln Lnull). Unlike OLS R-squared, it does not measure the fraction of variance explained. As a rough heuristic, values of 0.2 to 0.4 are often seen in well-fitting binary models — do not judge a logit model harshly for having a pseudo R-squared of 0.30. Comparing pseudo R-squared values across different logit specifications is valid, but comparing them to OLS R-squared is not meaningful because the two metrics are defined on fundamentally different scales.

An odds ratio is the exponentiated logit coefficient: OR = exp(βj). For a continuous variable, it gives the multiplicative change in the odds of Y = 1 for a one-unit increase in Xj. For example, an odds ratio of 1.046 means the odds increase by 4.6% per unit. For a binary variable (0/1 indicator), it gives the ratio of the odds when the indicator equals 1 versus 0. Importantly, odds ratios are not probability ratios — an odds ratio of 2.0 does not mean the probability doubles. The odds (P / [1 − P]) and the probability P are different quantities, and the relationship between them is nonlinear. Odds ratios are symmetric: if the odds ratio for having collateral is 0.538 (protective), the odds ratio for not having collateral is 1 / 0.538 = 1.858.

Disclaimer

This article is for educational and informational purposes only and does not constitute investment advice. The regression coefficients, predicted probabilities, and marginal effects used in examples are illustrative and may differ based on the data source, sample period, and model specification. Always conduct your own analysis and consult a qualified financial advisor before making investment decisions. Reference: Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach, 8th Edition, Cengage, 2025.