Understand the statistical assumptions behind the models you'll use. Know how to test them and what to do when they fail.
Models have assumptions. When assumptions are violated, results are misleading — sometimes catastrophically. This week walks through the five major assumptions in quantitative finance, how to test each, and what to do when they fail. Every test here must appear in your Investment Proposal's Section 3b.
Many analysts skip Section 3b because testing assumptions feels like busywork. It's not. Here's why:
Assume you build an OLS regression model to predict returns. OLS assumes normally distributed residuals. Your actual residuals have fat tails (returns have more extreme events than a normal distribution predicts). You report a Sharpe ratio assuming normality. QT reads your backtest results and sizes positions based on the Sharpe. When the strategy goes live, the worst drawdown is 3x what normal distribution would predict. The strategy has negative Sharpe live, but your IP said positive. Why? Violated assumptions.
This is why Section 3b is mandatory. It forces you to document which assumptions you're making and which you've tested. If you document "residuals are non-normal with fat tails," QT can adjust their risk model accordingly.
Many models (OLS, standard Sharpe ratio) assume residuals or returns are normally distributed. In finance, returns have fat tails — more extreme events than a normal distribution predicts.
Why it fails: Markets have crashes. The 1987 crash, 2008, 2020 — these are 5+ standard deviation events. Normal distribution assigns essentially zero probability to these.
Test: Jarque-Bera or Shapiro-Wilk
\[ JB = \frac{n}{6}\left(S^2 + \frac{(K-3)^2}{4}\right) \]Where S = skewness, K = kurtosis. Under normality, JB ~ χ²(2). p < 0.05 rejects normality.
Python:
from scipy import stats
jb_stat, jb_p = stats.jarque_bera(residuals)
sw_stat, sw_p = stats.shapiro(residuals)
print(f"Jarque-Bera: stat={jb_stat:.4f}, p={jb_p:.4f}")
print(f"Shapiro-Wilk: stat={sw_stat:.4f}, p={sw_p:.4f}")
# p < 0.05 → reject normality
If it fails: Report it. Use robust standard errors. Mention fat-tail risk in your expected drawdowns.
Time series models require stationarity — the mean and variance don't change over time. Price levels are almost never stationary. Returns usually are. Cointegration is a special case: a linear combination of two non-stationary series can be stationary.
Why it fails: You regress price on price (running OLS on price levels). Prices trend. Non-stationary series produce spurious regressions — high R², significant coefficients, but meaningless results.
Tests: Augmented Dickey-Fuller (ADF), KPSS
Python:
from statsmodels.tsa.stattools import adfuller, kpss
adf_result = adfuller(series, autolag='AIC')
print(f"ADF stat: {adf_result[0]:.4f}, p-value: {adf_result[1]:.4f}")
# p < 0.05 → reject H0 (unit root), series is stationary
kpss_result = kpss(series, regression='c', nlags='auto')
print(f"KPSS stat: {kpss_result[0]:.4f}, p-value: {kpss_result[1]:.4f}")
# p < 0.05 → reject H0 (stationarity), series is non-stationary
If it fails: Difference the series (use returns instead of prices) or use cointegration if appropriate.
OLS assumes errors are uncorrelated with each other. If residuals at time t are correlated with residuals at time t-1, standard errors are wrong — you're underestimating uncertainty. Also: autocorrelated returns suggest momentum or mean reversion, which has signal implications.
Why it fails: Volatility clusters. After a big move, another big move is more likely.
Test: Ljung-Box test
\[ Q = n(n+2)\sum_{k=1}^{m}\frac{\hat{\rho}_k^2}{n-k} \sim \chi^2(m) \]Python:
from statsmodels.stats.diagnostic import acorr_ljungbox
lb = acorr_ljungbox(residuals, lags=[10, 20], return_df=True)
print(lb)
# p < 0.05 → autocorrelation present
If it fails: Use Newey-West HAC standard errors (robust to autocorrelation).
Volatility changes over time. If your model assumes constant variance but volatility clusters, standard errors are wrong and risk estimates are too. This is called heteroscedasticity or GARCH effects.
Why it fails: Markets are calm, then they spike. Volatility regimes are real.
Test: Breusch-Pagan
Python:
from statsmodels.stats.diagnostic import het_breuschpagan
bp_stat, bp_p, _, _ = het_breuschpagan(residuals, exog)
print(f"Breusch-Pagan: stat={bp_stat:.4f}, p={bp_p:.4f}")
# p < 0.05 → heteroscedasticity present
If it fails: Use heteroscedasticity-robust standard errors (HC3 or HC4) or model volatility with GARCH.
For models with multiple predictors: if predictors are highly correlated, coefficient estimates are unstable and unreliable. The model can't distinguish which predictor is doing the work.
Why it fails: You use correlated features. Momentum at 5 days and momentum at 6 days are almost perfectly correlated.
Test: Variance Inflation Factor (VIF)
\[ VIF_j = \frac{1}{1 - R_j^2} \]VIF > 10 is problematic. VIF > 5 warrants investigation. VIF = 1 means no correlation with other predictors.
Python:
from statsmodels.stats.outliers_influence import variance_inflation_factor
vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
print(vif)
If it fails: Drop one of the correlated features or combine them.
Blue bars: actual returns. Orange curve: normal distribution fitted to these returns. Notice the tails: actual returns have more extreme events than normal would predict. This is why the Jarque-Bera test fails and why fat-tail risk is real.