Test and document every model assumption. This IS Section 3b of your IP. The most critical section for QT risk management.
This is the most technically detailed week. Section 3b is a direct reference for completing the assumption-testing portion of every IP. You will learn to test five critical assumptions (covered in Week 3), plus the special case of cointegration for pairs trading. Every test, result, and implication goes in Section 3b.
Copy this checklist into your IP. Complete every row. Report the test, the result, and what you'll do if it fails.
| Assumption | Test | Python | Fail condition | If it fails |
|---|---|---|---|---|
| Normality of residuals | Jarque-Bera, Shapiro-Wilk | scipy.stats.jarque_bera |
p < 0.05 | Use robust methods; report fat-tail risk |
| Stationarity | ADF, KPSS | statsmodels.adfuller |
ADF p > 0.05 | Difference series; use returns not prices |
| No autocorrelation | Ljung-Box | statsmodels.acorr_ljungbox |
p < 0.05 | Use Newey-West HAC standard errors |
| Homoscedasticity | Breusch-Pagan | statsmodels.het_breuschpagan |
p < 0.05 | Use HC3/HC4 robust SE or GARCH model |
| No multicollinearity | VIF | statsmodels.variance_inflation_factor |
VIF > 10 | Drop or combine correlated features |
For pairs strategies, you're not testing stationarity of individual series — you're testing stationarity of the spread (the linear combination).
Two series are cointegrated if a linear combination of them is stationary.
Method:
Python:
from statsmodels.tsa.stattools import coint
score, p_value, critical_values = coint(series_a, series_b)
print(f"Cointegration test p-value: {p_value:.4f}")
# p < 0.05 → cointegrated at 5% significance
For strategies with 3+ assets, use Johansen's cointegration test.
\[ \Delta \mathbf{y}_t = \Pi \mathbf{y}_{t-1} + \sum_{i=1}^{p-1} \Gamma_i \Delta \mathbf{y}_{t-i} + \varepsilon_t \]
Interpretation: The rank of Π determines the number of cointegrating relationships (linearly independent stationary combinations).
SECTION 3B: MODEL ASSUMPTIONS — Ridge Regression Commodity Strategy
Strategy uses a ridge regression to predict corn returns from 3 factors:
3b.1 Normality of residuals
Test: Jarque-Bera on regression residuals (2020–2023 in-sample)
Result: JB stat = 12.4, p-value = 0.002 → reject normality
Implication: Residuals have fat tails. Daily returns exhibit kurtosis = 4.2 (excess = 1.2). Documented.
Action: Report expected maximum drawdown assuming fat tails. Use robust standard errors (HC3).
3b.2 Stationarity of predictors
Test: ADF on each factor
All factors are stationary. No issues.
3b.3 No autocorrelation in residuals
Test: Ljung-Box on residuals, lags 1–20
Result: All p-values > 0.05. No significant autocorrelation detected. ✓
3b.4 Homoscedasticity
Test: Breusch-Pagan heteroscedasticity test
Result: BP stat = 18.3, p = 0.0003 → heteroscedasticity present
Action: Volatility clusters (expected in commodity markets). Use HC3 robust standard errors for inference.
3b.5 No multicollinearity
Test: Variance Inflation Factor on 3 factors
All VIF < 2. No multicollinearity concerns. ✓
Summary: Ridge regression suitable. Primary concerns: fat-tail risk (documented), heteroscedasticity (robust SE applied). Model assumptions documented and acceptable for live trading.
from scipy import stats
jb_stat, jb_p = stats.jarque_bera(residuals)
print(f"Jarque-Bera: stat={jb_stat:.4f}, p={jb_p:.4f}")
if jb_p < 0.05:
print("Residuals are NOT normally distributed (fat tails likely)")
else:
print("Residuals are consistent with normality")
from statsmodels.tsa.stattools import adfuller
result = adfuller(series, autolag='AIC')
print(f"ADF stat: {result[0]:.4f}, p-value: {result[1]:.4f}")
if result[1] < 0.05:
print("Series is stationary (reject unit root)")
else:
print("Series is non-stationary (unit root present)")
from statsmodels.stats.outliers_influence import variance_inflation_factor
import pandas as pd
vif_data = pd.DataFrame()
vif_data["Feature"] = X.columns
vif_data["VIF"] = [
variance_inflation_factor(X.values, i)
for i in range(X.shape[1])
]
print(vif_data)
if (vif_data["VIF"] > 10).any():
print("Multicollinearity detected. Consider dropping features.")