Week 5: Statistics for Quant Research

IP Anchor Section 3b — Model Assumptions This week teaches tests you must run and report. Every assumption you make goes in Section 3b of your IP.

What this week covers

Models have assumptions. When assumptions are violated, results are misleading — sometimes catastrophically. This week walks through the five major assumptions in quantitative finance, how to test each, and what to do when they fail. Every test here must appear in your Investment Proposal's Section 3b.

Why model assumptions matter in QR

Many analysts skip Section 3b because testing assumptions feels like busywork. It's not. Here's why:

Assume you build an OLS regression model to predict returns. OLS assumes normally distributed residuals. Your actual residuals have fat tails (returns have more extreme events than a normal distribution predicts). You report a Sharpe ratio assuming normality. QT reads your backtest results and sizes positions based on the Sharpe. When the strategy goes live, the worst drawdown is 3x what normal distribution would predict. The strategy has negative Sharpe live, but your IP said positive. Why? Violated assumptions.

This is why Section 3b is mandatory. It forces you to document which assumptions you're making and which you've tested. If you document "residuals are non-normal with fat tails," QT can adjust their risk model accordingly.

Five critical assumptions

1. Normality of residuals

Many models (OLS, standard Sharpe ratio) assume residuals or returns are normally distributed. In finance, returns have fat tails — more extreme events than a normal distribution predicts.

Why it fails: Markets have crashes. The 1987 crash, 2008, 2020 — these are 5+ standard deviation events. Normal distribution assigns essentially zero probability to these.

Test: Jarque-Bera or Shapiro-Wilk

\[ JB = \frac{n}{6}\left(S^2 + \frac{(K-3)^2}{4}\right) \]

Where S = skewness, K = kurtosis. Under normality, JB ~ χ²(2). p < 0.05 rejects normality.

Python:

from scipy import stats

jb_stat, jb_p = stats.jarque_bera(residuals)
sw_stat, sw_p = stats.shapiro(residuals)

print(f"Jarque-Bera: stat={jb_stat:.4f}, p={jb_p:.4f}")
print(f"Shapiro-Wilk: stat={sw_stat:.4f}, p={sw_p:.4f}")
# p < 0.05 → reject normality

If it fails: Report it. Use robust standard errors. Mention fat-tail risk in your expected drawdowns.

2. Stationarity of time series

Time series models require stationarity — the mean and variance don't change over time. Price levels are almost never stationary. Returns usually are. Cointegration is a special case: a linear combination of two non-stationary series can be stationary.

Why it fails: You regress price on price (running OLS on price levels). Prices trend. Non-stationary series produce spurious regressions — high R², significant coefficients, but meaningless results.

Tests: Augmented Dickey-Fuller (ADF), KPSS

Python:

from statsmodels.tsa.stattools import adfuller, kpss

adf_result = adfuller(series, autolag='AIC')
print(f"ADF stat: {adf_result[0]:.4f}, p-value: {adf_result[1]:.4f}")
# p < 0.05 → reject H0 (unit root), series is stationary

kpss_result = kpss(series, regression='c', nlags='auto')
print(f"KPSS stat: {kpss_result[0]:.4f}, p-value: {kpss_result[1]:.4f}")
# p < 0.05 → reject H0 (stationarity), series is non-stationary

If it fails: Difference the series (use returns instead of prices) or use cointegration if appropriate.

3. No autocorrelation in residuals

OLS assumes errors are uncorrelated with each other. If residuals at time t are correlated with residuals at time t-1, standard errors are wrong — you're underestimating uncertainty. Also: autocorrelated returns suggest momentum or mean reversion, which has signal implications.

Why it fails: Volatility clusters. After a big move, another big move is more likely.

Test: Ljung-Box test

\[ Q = n(n+2)\sum_{k=1}^{m}\frac{\hat{\rho}_k^2}{n-k} \sim \chi^2(m) \]

Python:

from statsmodels.stats.diagnostic import acorr_ljungbox

lb = acorr_ljungbox(residuals, lags=[10, 20], return_df=True)
print(lb)
# p < 0.05 → autocorrelation present

If it fails: Use Newey-West HAC standard errors (robust to autocorrelation).

4. Homoscedasticity (constant variance)

Volatility changes over time. If your model assumes constant variance but volatility clusters, standard errors are wrong and risk estimates are too. This is called heteroscedasticity or GARCH effects.

Why it fails: Markets are calm, then they spike. Volatility regimes are real.

Test: Breusch-Pagan

Python:

from statsmodels.stats.diagnostic import het_breuschpagan

bp_stat, bp_p, _, _ = het_breuschpagan(residuals, exog)
print(f"Breusch-Pagan: stat={bp_stat:.4f}, p={bp_p:.4f}")
# p < 0.05 → heteroscedasticity present

If it fails: Use heteroscedasticity-robust standard errors (HC3 or HC4) or model volatility with GARCH.

5. No multicollinearity

For models with multiple predictors: if predictors are highly correlated, coefficient estimates are unstable and unreliable. The model can't distinguish which predictor is doing the work.

Why it fails: You use correlated features. Momentum at 5 days and momentum at 6 days are almost perfectly correlated.

Test: Variance Inflation Factor (VIF)

\[ VIF_j = \frac{1}{1 - R_j^2} \]

VIF > 10 is problematic. VIF > 5 warrants investigation. VIF = 1 means no correlation with other predictors.

Python:

from statsmodels.stats.outliers_influence import variance_inflation_factor

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
print(vif)

If it fails: Drop one of the correlated features or combine them.

Chart: Example distribution with fat tails

Blue bars: actual returns. Orange curve: normal distribution fitted to these returns. Notice the tails: actual returns have more extreme events than normal would predict. This is why the Jarque-Bera test fails and why fat-tail risk is real.

Common mistakes

Five statistical sins

Using price levels instead of returns in regression. Price levels are non-stationary. Always test with ADF first. If non-stationary, use returns or differences.
Ignoring fat tails because "it's close enough" to normal. It's not. Document the deviation from normality in Section 3b. Report expected extreme drawdown.
Running ADF but not reporting it in Section 3b. If you run the test, report the result and p-value in your IP. QT needs to know you validated assumptions.
Building a multi-factor model without checking VIF. Correlated factors cancel each other out silently in live trading. Check VIF before submitting.
Skipping assumptions entirely. Section 3b is mandatory. At minimum: test residuals for normality and autocorrelation, test series for stationarity. Report results.

← Week 4: Fixed Income & Equity Factors Week 6: Toolstack →

StatisticalInference

What this week covers

Why model assumptions matter in QR

Five critical assumptions

1. Normality of residuals

2. Stationarity of time series

3. No autocorrelation in residuals

4. Homoscedasticity (constant variance)

5. No multicollinearity

Chart: Example distribution with fat tails

Common mistakes

Five statistical sins

Statistical
Inference