Run backtests correctly, interpret results honestly, and document your out-of-sample validation. This IS Section 4 of your IP.
This week teaches the four required metrics for any backtest (Sharpe, annual return, win rate, profit factor), how to interpret an equity curve, the critical distinction between in-sample and out-of-sample results, and the three backtest sins to avoid. Section 4 is where you report everything.
Annualized return per unit of risk. Multiply by √252 for daily data.
Interpretation:
Critical caveat: In-sample Sharpe is always higher than live Sharpe. A backtest Sharpe of 1.5 means a live Sharpe of ~1.0–1.2. Assume 20–30% degradation.
Compound growth rate annualized. Report gross (before costs) and net (after transaction costs). If they differ by > 1–2%, transaction costs are material.
Win Rate: Percentage of profitable trades.
\[ \text{Win Rate} = \frac{\text{# winning trades}}{\text{# total trades}} \]Profit Factor: Ratio of gross profit to gross loss.
\[ \text{Profit Factor} = \frac{\sum \text{winning P&L}}{\sum |\text{losing P&L}|} \]Why both matter: Win rate alone is meaningless. A strategy that wins 30% of the time can be profitable if winners are 3.5x the size of losers. A strategy that wins 70% of the time can lose money if losers are 2.5x the size of winners.
The equity curve is the cumulative P&L over time. It shows HOW you got to the final return, not just that you did. Look for:
In-sample: Data used to develop the strategy (optimize parameters, train models).
Out-of-sample: Data the researcher never saw before backtest.
In-sample Sharpe is almost always higher than OOS Sharpe. This is not a bug — it's mathematics. You optimized to the in-sample data.
Report rule: Always report both IS and OOS metrics. If OOS << IS, you overfit. Example: "In-sample Sharpe 1.45, hold-out Sharpe 0.92" suggests 36% degradation (normal). "In-sample Sharpe 2.1, hold-out Sharpe 0.3" suggests severe overfitting.
Using future data to generate signals for the past. The most common mistake.
Example: Using the month's final close on the 1st of the month to generate signals for trades on the 1st. You're using data from the future (end of month) to trade at the beginning.
How to avoid: Use only data available at decision time. If your signal fires on day T, use only data available on day T (market close or earlier).
Testing 1,000 combinations of lookback window, threshold, and entry/exit logic. Reporting the best result. By chance, something will look great.
Math: If you test n parameter combinations, the expected maximum Sharpe by chance is roughly 0.5 + 0.1√log(n). For 100 combinations: ~0.9 expected by noise alone.
How to avoid: (1) Specify signal parameters in advance based on economic theory. (2) If you must do parameter search, test on walk-forward OOS data, not in-sample. (3) Report Deflated Sharpe Ratio: corrects for multiple testing.
Every fill has slippage. Every trade has commission. For high-frequency signals, costs can eliminate all alpha.
commission_bps = 2 # 2 basis points per side
slippage_bps = 3 # 3 basis points estimated slippage
total_cost_bps = commission_bps + slippage_bps
# Round-trip cost (open and close): 2 × total_cost
df['net_return'] = df['gross_return'] - (df['trade_flag'] * 2 * total_cost_bps / 10000)
Rule of thumb: If your Sharpe drops by > 20% after adding transaction costs, the edge is marginal.
SECTION 4: BACKTEST RESULTS — Corn Futures Strategy
4.1 Performance Metrics (2010–2024 in-sample)
4.2 Out-of-Sample Validation
Hold-out period: 2023–2024 (final 1/7 of dataset, untouched during development)
Degradation is normal and expected. The OOS Sharpe of 0.87 is solid.
4.3 Risk Analysis
Sortino (1.76) > Sharpe (1.07) indicates downside volatility is well-managed — most volatility comes from upside swings. Calmar (0.86) suggests moderate drawdown recovery. Maximum single drawdown of -16.37% lasted approximately 4 months (recovery by month 5).
4.4 Transaction Cost Sensitivity
Assuming 5bps round-trip cost (commission + slippage), net Sharpe = 0.98. At 10bps cost, Sharpe drops to 0.88. Edge survives transaction costs.