Simple Linear Regression: The OLS Method Explained
Simple linear regression is the foundation of econometric analysis. Whether you are estimating a stock’s sensitivity to market movements, testing an economic theory about the relationship between two variables, or building a forecasting model, understanding how ordinary least squares (OLS) works is essential. This guide covers the population model, OLS estimation, slope and intercept interpretation, R-squared as a measure of goodness of fit, the key assumptions required for OLS to be unbiased, and the Gauss-Markov theorem that establishes OLS as the best linear unbiased estimator.
What Is Simple Linear Regression?
Simple linear regression models the relationship between two variables: a dependent variable (Y) that you want to explain and a single independent variable (X) that you believe influences it. In finance, common applications include regressing a stock’s return on a market index return, a firm’s revenue on its advertising expenditure, or a bond’s yield on the prevailing interest rate. The population model is:
Y = β0 + β1X + u
where β0 is the intercept, β1 is the slope, and u is the error term — the sum of all unobserved factors that affect Y beyond X.
The error term u is not a nuisance to be ignored. It captures everything the model leaves out: omitted variables, measurement imprecision, and inherently random variation. The population model describes the true data-generating process, which we never observe directly. Instead, we estimate β0 and β1 from sample data.
When the zero conditional mean assumption holds — E(u|X) = 0 — we can write the population regression function as E(Y|X) = β0 + β1X. Under this condition, a one-unit increase in X changes the expected value of Y by β1. Causal language requires that the unobserved factors in u do not move systematically with X. For a broader introduction to econometric methods and the role of causal inference, see our guide on what is econometrics.
The OLS Method: Finding the Best-Fitting Line
Ordinary least squares (OLS) chooses estimates β̂0 and β̂1 to minimize the sum of squared residuals (SSR) — the total squared distance between the observed values and the fitted line. Why squared? Squaring penalizes large misses more heavily than small ones and produces closed-form solutions that are easy to compute.
Once we have these estimates, we compute two key quantities for every observation:
- Fitted value: ŷi = β̂0 + β̂1xi — the model’s prediction for observation i
- Residual: ûi = yi − ŷi — the difference between the actual and predicted value
An important distinction: the error term (u) is a population concept — the true unobserved deviation from the population regression function. The residual (û) is its sample counterpart — the deviation from the estimated regression line. We never observe u directly; we only observe û.
The residuals also allow us to estimate the error variance σ², which measures how dispersed the unobserved factors are around zero. Under the standard assumptions (introduced below), the following estimator is unbiased:
We divide by (n − 2) rather than n to correct for the downward bias that would result from using the same data to both estimate the coefficients and assess the error variance. The square root of σ̂² is the standard error of the regression (SER), which tells you the typical size of a residual in the units of Y.
When the regression includes an intercept, OLS guarantees three algebraic properties: the residuals sum to zero (Σûi = 0), the residuals are sample-uncorrelated with X (Σxiûi = 0), and the fitted regression line passes through the point (x̄, ȳ) — the sample means of X and Y.
Interpreting the Slope and Intercept
The OLS slope β̂1 tells you how much the predicted value of Y changes for a one-unit increase in X. If you regress a stock’s monthly return on the market’s monthly return and obtain β̂1 = 1.25, then each additional percentage point of market return is associated with a 1.25 percentage point change in the stock’s return.
The intercept β̂0 is the predicted value of Y when X equals zero. Depending on the context, this may or may not have a meaningful interpretation. In a market-return regression, β̂0 represents the stock’s predicted return when the market return is zero — economically interesting but not always the focus of the analysis.
Changing the units of measurement affects the coefficients predictably. If you multiply X by a constant c, the slope is divided by c, but the intercept remains unchanged — the predicted value of Y when X = 0 does not depend on how X is scaled. If you rescale Y instead, both coefficients change proportionally. These are purely mechanical effects that do not alter the underlying relationship.
Suppose you regress quarterly revenue (in millions of dollars) on advertising spending (in thousands of dollars) for a sample of 40 firms and obtain:
Revenuê = 2.1 + 0.045 × Advertising
- Slope: Each additional $1,000 in advertising is associated with $0.045 million ($45,000) in additional revenue
- Intercept: A firm spending zero on advertising has a predicted revenue of $2.1 million
If you re-express advertising in millions instead of thousands, the slope becomes 45.0 and the intercept stays 2.1. The relationship is identical — only the units changed.
The OLS slope measures a statistical association between X and Y. A causal interpretation — that changing X causes Y to change by β1 — requires the zero conditional mean assumption E(u|X) = 0 to hold. If omitted variables in the error term are correlated with X, the estimated slope conflates the effect of X with the influence of those omitted factors.
Goodness of Fit: R-Squared
R-squared (R²) measures how well the regression line fits the data. It quantifies the fraction of the sample variation in Y that is explained by X.
Where:
- SST (Total Sum of Squares) = Σ(yi − ȳ)² — total variation in Y around its mean
- SSR (Residual Sum of Squares) = Σûi² — variation left unexplained by the model
- ESS (Explained Sum of Squares) = SST − SSR — variation explained by the regression
R² always falls between 0 and 1. A value of 0.65 means the regression explains 65% of the sample variation in Y. In simple linear regression, R² equals the square of the sample correlation between X and Y. For a deeper treatment of correlation and covariance, see our guide on correlation and covariance.
An analyst regresses the monthly return of an energy ETF on crude oil price changes over 48 months and obtains R² = 0.72. This means 72% of the month-to-month variation in the ETF’s return is explained by oil price movements. The remaining 28% reflects other factors: natural gas prices, refining margins, individual company news, and broader market sentiment.
By contrast, regressing a diversified large-cap equity fund on the S&P 500 might yield R² = 0.97, because the fund closely tracks the index. R-squared is context-dependent — what counts as “high” or “low” depends on the variables and the question being asked.
A model with R² = 0.30 can still have a highly statistically significant and economically meaningful slope. R-squared measures in-sample fit, not the importance of the relationship or the model’s predictive power on new data. In finance and economics, R-squared values of 0.10 to 0.40 are common and perfectly acceptable.
Simple Linear Regression Assumptions and OLS Properties
The desirable statistical properties of OLS depend on a set of assumptions about the population model and the data. Wooldridge labels these SLR.1 through SLR.5:
| Assumption | Statement | Why It Matters |
|---|---|---|
| SLR.1 | Linear in parameters: Y = β0 + β1X + u | Defines the population model OLS is designed to estimate |
| SLR.2 | Random sampling: {(xi, yi): i = 1, …, n} is a random sample | Ensures each observation is drawn independently from the same population |
| SLR.3 | Sample variation in X: the xi values are not all the same | Without variation in X, the slope formula divides by zero |
| SLR.4 | Zero conditional mean: E(u|X) = 0 | The most critical assumption — required for unbiasedness and causal interpretation |
| SLR.5 | Homoskedasticity: Var(u|X) = σ² | Constant error variance — needed for the Gauss-Markov efficiency result |
Note that SLR.1 says “linear in parameters” — the relationship between X and Y need not be literally a straight line. Models like Y = β0 + β1log(X) + u or Y = β0 + β1X² + u are still linear in β0 and β1, so OLS applies. What matters is that the parameters enter the equation linearly, not that X appears in raw form.
Under assumptions SLR.1 through SLR.4, the OLS estimators are unbiased: E(β̂1) = β1. This means that if you were to draw many random samples and estimate the slope each time, the average of those estimates would equal the true population slope. Unbiasedness is a property of the estimation procedure, not of any single estimate — any particular sample may yield a slope that is above or below the true value.
Under assumptions SLR.1 through SLR.5, OLS is the Best Linear Unbiased Estimator (BLUE). “Best” means that among all linear estimators that are unbiased, OLS has the smallest variance. No other linear unbiased estimator can produce more precise estimates than OLS when these assumptions hold.
When assumptions are violated, the consequences depend on which assumption fails:
- SLR.4 violated (zero conditional mean fails): OLS is biased — the slope systematically over- or underestimates β1. This is the most serious problem in applied work and typically arises from omitted variable bias.
- SLR.5 violated (heteroskedasticity): OLS remains unbiased, but the usual standard errors are incorrect. Hypothesis tests and confidence intervals become unreliable. Heteroskedasticity-robust standard errors can fix this.
- SLR.2 violated (non-random sampling): Sample selection issues can bias both the slope and intercept. For example, studying only profitable firms when profitability is related to the dependent variable.
For formal methods of testing whether the slope is statistically different from zero, see our guide on hypothesis testing in regression.
The variance of the OLS slope estimator is Var(β̂1) = σ² / Σ(xi − x̄)². Two practical implications follow: (1) more data (larger n) increases the denominator and reduces variance, and (2) more spread in X values also reduces variance. When designing a study, choosing a sample with wide variation in X yields more precise slope estimates.
Finance Example: Estimating Market Sensitivity
Simple linear regression is widely used in finance to estimate how sensitive a stock’s returns are to overall market movements. Consider an analyst who collects 60 months of return data for Apple (AAPL) and the S&P 500 index, then runs the regression:
RAAPL,t = β̂0 + β̂1 × RS&P500,t
| Statistic | Estimate | Interpretation |
|---|---|---|
| Slope (β̂1) | 1.25 | A 1% increase in S&P 500 return is associated with a 1.25% increase in Apple’s return |
| Intercept (β̂0) | 0.30% | Apple’s predicted monthly return when the market return is zero |
| R² | 0.58 | 58% of Apple’s return variation is explained by market movements |
| n | 60 | Five years of monthly observations |
The estimated slope of 1.25 tells us that Apple’s returns are about 25% more sensitive to market movements than the market benchmark (which by definition has a slope of 1.0). The R² of 0.58 indicates that market-wide forces account for more than half of Apple’s return variation, with the remaining 42% attributable to firm-specific factors captured in the residuals.
Finance practitioners refer to this estimated slope as the stock’s beta — a measure of systematic risk. This regression is an application of the market model — a time-series regression of realized stock returns on realized market returns. Finance theory, particularly the Capital Asset Pricing Model (CAPM), motivates this specification. Here, we use it purely as an illustration of OLS mechanics: how the slope, intercept, and R-squared are estimated and interpreted.
The quality of your regression depends on the data behind it. Use at least 36 to 60 monthly observations for stable estimates. Shorter windows introduce noise; longer windows risk including periods where the company’s risk profile was fundamentally different (e.g., before a major acquisition or industry shift). Always check whether the residuals reveal obvious patterns that suggest the linear model is misspecified.
Simple vs. Multiple Regression
Simple regression uses a single independent variable. When the analysis requires controlling for additional factors, the model extends to multiple regression. Understanding the distinction helps you decide which approach fits your research question.
Simple Regression
- One independent variable (X)
- Slope = sample Cov(X,Y) / sample Var(X)
- Captures total association between X and Y
- Susceptible to omitted variable bias if relevant factors are excluded
- Best for: bivariate relationships, preliminary analysis
Multiple Regression
- Two or more independent variables
- Each slope is a partial effect (holding other variables constant)
- Controls for confounding factors
- Reduces omitted variable bias when relevant controls are included
- Best for: isolating individual effects, testing theories
Suppose you regress a firm’s stock return on the market return and find a slope of 0.90. But the firm is in the energy sector, and oil prices also drive its returns. If oil price changes are correlated with the market return, the simple regression slope of 0.90 mixes the effect of the market with the effect of oil prices. A multiple regression that includes both the market return and oil price changes would separate these influences and produce a more accurate estimate of market sensitivity.
Simple regression gives unbiased estimates only if no omitted variable is both correlated with X and affects Y. When that condition is unlikely to hold, multiple regression provides a path to more credible estimates by explicitly controlling for additional factors.
Common Mistakes in Simple Regression
Simple linear regression is conceptually straightforward, but several common errors can lead to incorrect conclusions. Being aware of these pitfalls helps you interpret results more carefully:
1. Confusing the population model with the sample regression function. The population model Y = β0 + β1X + u includes the unobservable error term u and describes the true data-generating process. The sample regression function Ŷ = β̂0 + β̂1X produces fitted values from estimated coefficients. The two are conceptually different: the population model is what we are trying to learn about; the sample regression is our best estimate from available data.
2. Confusing the error term with the residual. The error term (u) is a theoretical population quantity that we never observe — it represents all unobserved factors affecting Y. The residual (û) is the observable sample analogue, calculated as the difference between actual and fitted values. Properties that hold for residuals (e.g., Σûi = 0) do not necessarily hold for errors.
3. Interpreting R-squared as prediction accuracy. R-squared measures the proportion of in-sample variation explained by the model, not how accurately the model predicts new data. A model with R² = 0.30 in finance is not “only 30% accurate” — it may still capture the most important economic relationship in the data.
4. Treating association as causation. The OLS slope measures a statistical association. Concluding that X causes changes in Y requires the zero conditional mean assumption E(u|X) = 0 to hold. When omitted factors in u are correlated with X, the slope confounds the effect of X with those omitted influences. For an introduction to causal reasoning in econometrics, see our guide on what is econometrics.
5. Ignoring omitted variable bias. In simple regression, any relevant variable that is excluded from the model and correlated with X will bias the slope estimate. For example, if you regress firm profitability on advertising spending without controlling for firm size, the advertising slope will partly reflect the influence of size. Multiple regression addresses this by adding control variables.
Frequently Asked Questions
Disclaimer
This article is for educational and informational purposes only and does not constitute investment advice. The regression estimates used in examples are illustrative and may differ based on the data source, time period, and methodology. Always conduct your own analysis and consult a qualified financial advisor before making investment decisions. Reference: Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach, 8th Edition, Cengage, 2025.