Descriptive Statistics: Central Tendency and Dispersion
Descriptive statistics are the foundation of quantitative analysis in finance. Whether you’re evaluating a stock’s historical returns, comparing two mutual funds, or assessing portfolio risk, you need these tools to summarize raw data into actionable insights. This guide covers everything you need to know — measures of central tendency, dispersion metrics, when to use each, and where they can mislead you.
What Are Descriptive Statistics?
Descriptive statistics are numerical measures that summarize and describe the main features of a dataset. In finance, they transform thousands of daily or monthly returns into a handful of numbers that capture a distribution’s center (central tendency) and spread (dispersion).
Descriptive statistics answer “what happened?” — they summarize historical data. This is distinct from inferential statistics, which use sample data to make predictions or test hypotheses about a larger population.
Financial analysts rely on descriptive statistics for several core tasks: comparing the average returns of different asset classes, measuring how much a portfolio’s returns vary from period to period, benchmarking a fund’s performance against an index, and identifying outliers that may signal data errors or unusual market events.
The two main categories are measures of central tendency (mean, median, mode, geometric mean) and measures of dispersion (range, mean absolute deviation, variance, standard deviation, coefficient of variation). Understanding when to use each is essential for accurate financial analysis.
Measures of Central Tendency
Central tendency measures describe where the “center” of a distribution lies. Different measures are appropriate for different situations.
Arithmetic Mean
The arithmetic mean is the sum of all values divided by the number of values — the familiar average. It’s the most commonly used measure of central tendency in finance, particularly for single-period expected returns.
The arithmetic mean is easy to calculate and uses all available data. However, it’s sensitive to extreme values — a single outlier can pull the mean significantly away from where most observations cluster.
Median
The median is the middle value when observations are sorted from lowest to highest. For an even number of observations, it’s the average of the two middle values. The median is more robust to outliers than the arithmetic mean.
In finance, the median is particularly useful for skewed distributions. For example, when analyzing household wealth or hedge fund returns, where a few extreme values can distort the arithmetic mean, the median often provides a more representative picture of the typical observation.
Mode
The mode is the most frequently occurring value in a dataset. It’s most useful for discrete data, such as credit ratings, where you might want to know the most common rating in a bond portfolio. For continuous return data, the mode is rarely used because values seldom repeat exactly.
Geometric Mean
The geometric mean measures the average compound growth rate over multiple periods. It’s essential for multi-period investment returns because it accounts for the compounding effect that the arithmetic mean ignores.
A key property: the geometric mean is always less than or equal to the arithmetic mean, with equality only when all values are identical. The greater the variability in returns, the larger this gap becomes.
| Measure | Best Used For | Limitation |
|---|---|---|
| Arithmetic Mean | Single-period expected returns, cross-sectional comparisons | Sensitive to outliers; overstates multi-period growth |
| Median | Skewed distributions, presence of outliers | Ignores distribution shape; less efficient statistically |
| Mode | Discrete data (ratings, categories) | Often undefined for continuous data |
| Geometric Mean | Multi-period compound returns | Returns -100% if any period has total loss |
Measures of Dispersion
While central tendency tells you where returns cluster, dispersion measures tell you how spread out they are. In finance, dispersion is directly related to risk — wider dispersion means greater uncertainty.
Range
The range is the simplest dispersion measure: the difference between the maximum and minimum values. While easy to calculate, it uses only two data points and is extremely sensitive to outliers.
Mean Absolute Deviation (MAD)
The mean absolute deviation is the average of the absolute differences between each observation and the mean. It provides a more intuitive measure of “typical” deviation than variance because it’s in the same units as the original data.
Note: The abbreviation MAD is also used for median absolute deviation (deviations from the median), which is more resistant to outliers. This article uses the mean-based version, which is more common in introductory finance applications.
Variance
Variance is the average of squared deviations from the mean. Squaring gives more weight to larger deviations and ensures all deviations are positive. The trade-off is that variance is expressed in squared units, making direct interpretation difficult.
Standard Deviation
Standard deviation is the square root of variance, bringing the measure back to the original units (percentage points for returns). It’s the most widely used dispersion measure in finance. For a detailed treatment of standard deviation as a risk metric — including annualization and portfolio applications — see our guide on standard deviation in finance.
Coefficient of Variation
The coefficient of variation (CV) is the ratio of standard deviation to the mean, expressed as a percentage or decimal. It measures relative dispersion, allowing comparison across datasets with different scales or units.
The coefficient of variation is unreliable when the mean is close to zero, negative, or when comparing returns measured over different time horizons. Use it only when the mean is meaningfully positive and the data are measured consistently.
Sample vs Population Statistics
In statistics, a population includes every possible observation, while a sample is a subset drawn from the population. In finance, you almost always work with samples — you have historical returns for some period, not every possible return the asset could generate.
The distinction affects how you calculate variance and standard deviation:
Population Statistics
- Divide by N (total observations)
- Symbols: μ (mean), σ (standard deviation)
- Used when you have the entire dataset
- Rare in finance — requires complete information
Sample Statistics
- Variance/std dev divide by n – 1 (Bessel’s correction)
- Symbols: x̄ (mean), s (standard deviation)
- Used when estimating from historical data
- Standard practice in financial analysis
Bessel’s correction (dividing by n – 1 instead of n) adjusts for the fact that a sample tends to underestimate the true population variance. Using n – 1 provides an unbiased estimate of the population variance.
Quantiles and Percentiles
Quantiles divide an ordered dataset into equal parts. The most common are:
- Quartiles: Q1 (25th percentile), Q2 (median, 50th percentile), Q3 (75th percentile)
- Deciles: Divide data into 10 equal parts
- Percentiles: Divide data into 100 equal parts
In finance, percentiles are used to benchmark performance (a fund in the 90th percentile outperformed 90% of peers), analyze return distribution tails (the 5th percentile for downside risk), and construct quantile-based portfolios for factor investing.
Different software packages use different interpolation methods for percentiles, which can produce slightly different results — especially with small samples. Excel’s PERCENTILE.INC and PERCENTILE.EXC functions use different conventions, and Python’s numpy uses yet another method. When precision matters, verify which interpolation your tool uses.
Interpreting Descriptive Statistics
Each descriptive statistic has strengths and blind spots. Use them together for a complete picture:
| Metric | What It Tells You | Watch Out For |
|---|---|---|
| Arithmetic Mean | Average return level | Outliers pull it away from the typical value |
| Geometric Mean | Compound growth rate | Returns -100% if any period has total loss |
| Median | Typical value, resistant to outliers | Ignores distribution shape and tails |
| Standard Deviation | Typical spread around the mean | Treats gains and losses symmetrically |
| Coefficient of Variation | Risk per unit of return | Meaningless if mean is near zero or negative |
| Percentiles | Position within a distribution | Requires sufficient sample size for reliability |
Always compare statistics calculated over the same time frequency. A monthly standard deviation cannot be directly compared to an annual figure — you must annualize first. See standard deviation in finance for annualization methods.
Descriptive Statistics Example
Let’s calculate key descriptive statistics for a stock’s 12 monthly returns.
A stock produced the following monthly returns:
| Month | Return |
|---|---|
| Jan | +4% |
| Feb | -2% |
| Mar | +6% |
| Apr | +1% |
| May | -3% |
| Jun | +5% |
| Jul | +2% |
| Aug | -1% |
| Sep | +7% |
| Oct | +3% |
| Nov | -4% |
| Dec | +2% |
Step 1: Arithmetic Mean
Sum = 4 + (-2) + 6 + 1 + (-3) + 5 + 2 + (-1) + 7 + 3 + (-4) + 2 = 20%
Arithmetic Mean = 20% / 12 = 1.67%
Step 2: Median
Sorted: -4, -3, -2, -1, +1, +2, +2, +3, +4, +5, +6, +7
Middle values (6th and 7th): +2 and +2
Median = (2 + 2) / 2 = 2.00%
Step 3: Geometric Mean
Product of gross returns = 1.04 × 0.98 × 1.06 × 1.01 × 0.97 × 1.05 × 1.02 × 0.99 × 1.07 × 1.03 × 0.96 × 1.02 = 1.2111
Geometric Mean = 1.21111/12 – 1 = 1.61%
Step 4: Sample Variance
Sum of squared deviations from mean (1.67%) = 140.67
Sample Variance = 140.67 / 11 = 12.79 (percentage points squared)
Step 5: Sample Standard Deviation
s = √12.79 = 3.58%
Step 6: Coefficient of Variation
CV = 3.58% / 1.67% = 2.14
Interpretation: The stock averaged 1.67% per month with a geometric mean of 1.61% (compound growth rate). The 0.06% difference reflects the drag from return volatility. A CV of 2.14 means the monthly standard deviation is about 2.14 times the average monthly return.
Arithmetic Mean vs Geometric Mean
Understanding when to use arithmetic versus geometric mean is one of the most important distinctions in investment analysis.
Arithmetic Mean
- Simple average of returns
- Best for single-period expected returns
- Overstates multi-period compound growth
- Used in CAPM and forward-looking projections
- Always ≥ geometric mean
Geometric Mean
- Compound average growth rate
- Accurate for multi-period realized returns
- Accounts for the effect of volatility
- Used for historical performance reporting
- Always ≤ arithmetic mean
Consider an investment with two annual returns: +50% in Year 1, -50% in Year 2.
- Arithmetic mean = (50% + (-50%)) / 2 = 0%
- Geometric mean = √(1.50 × 0.50) – 1 = √0.75 – 1 = -13.4%
If you invested $100:
- After Year 1: $100 × 1.50 = $150
- After Year 2: $150 × 0.50 = $75
You lost 25% of your investment, yet the arithmetic mean suggests you broke even. The geometric mean (-13.4% per year) correctly reflects the compound decline.
The gap between arithmetic and geometric mean widens as return volatility increases. This phenomenon — called volatility drag — explains why high-volatility investments often underperform their “average” returns over time.
How to Calculate Descriptive Statistics
Follow these steps to calculate key descriptive statistics for any return series:
- Collect returns: Gather periodic returns (daily, monthly, annual) as percentages
- Calculate arithmetic mean: Sum all returns and divide by the count
- Sort and find median: Arrange returns in order; find the middle value(s)
- Compute gross returns: Add 1 to each return (e.g., 5% becomes 1.05)
- Calculate geometric mean: Take the nth root of the product of gross returns, then subtract 1
- Calculate deviations: Subtract the arithmetic mean from each return
- Square and sum deviations: Square each deviation and add them together
- Compute sample variance: Divide the sum of squared deviations by (n – 1)
- Compute standard deviation: Take the square root of variance
- Calculate CV: Divide standard deviation by the arithmetic mean
For portfolio-level applications of these statistics, see our guide on portfolio diversification.
Common Mistakes
Avoid these frequent errors when working with descriptive statistics in finance:
1. Using arithmetic mean for multi-period returns — The arithmetic mean overstates compound growth. Use the geometric mean when measuring how an investment actually performed over time.
2. Using population formulas for sample data — Dividing by n instead of (n – 1) underestimates variance. In finance, you’re almost always working with samples of historical returns, not complete populations.
3. Ignoring outliers — A single extreme return can significantly distort the arithmetic mean. Always check the median and look for outliers before drawing conclusions.
4. Comparing statistics across different frequencies — A monthly standard deviation of 5% is not comparable to an annual standard deviation of 15%. Convert to the same time frame before comparing.
5. Misusing coefficient of variation — CV is meaningless when the mean is near zero or negative. It’s also unreliable when comparing returns measured over different horizons.
6. Over-interpreting summary statistics from skewed data — Two distributions can have identical means and standard deviations but very different shapes. Summary statistics don’t capture skewness and kurtosis, which matter for understanding tail risk.
Limitations of Descriptive Statistics
Descriptive statistics are powerful tools, but they have important limitations:
Descriptive statistics reduce a full distribution to a few numbers. Two datasets can have identical means, medians, and standard deviations but look completely different. Always visualize your data when possible — a histogram reveals patterns that summary statistics hide.
1. Backward-looking — Descriptive statistics summarize what happened, not what will happen. Past returns don’t guarantee future performance. Market regimes change, and historical statistics may not reflect current conditions.
2. No distributional shape information — Mean and standard deviation assume (or work best with) symmetric distributions. Financial returns often exhibit skewness (asymmetry) and fat tails (extreme events more frequent than normal distributions suggest). For these characteristics, see skewness and kurtosis in returns.
3. Time-ordering is lost — Descriptive statistics treat all observations equally regardless of sequence. They can’t capture momentum, mean reversion, or autocorrelation — patterns where the order of returns matters.
4. Symmetric treatment of gains and losses — Standard deviation penalizes large gains the same as large losses. Most investors care more about downside risk. For asymmetric risk measures, consider downside deviation or Value at Risk.
Descriptive statistics are the essential starting point for financial analysis, but they should never be your only tool. Combine them with distributional analysis (probability distributions), visualization, and forward-looking risk measures for a complete picture.
Frequently Asked Questions
Disclaimer
This article is for educational and informational purposes only and does not constitute investment advice. Statistical measures cited are for illustrative purposes and may differ based on calculation methods, data sources, and time periods. Always conduct your own research and consult a qualified financial advisor before making investment decisions.