IP Anchor Section 4 — Backtest results The final section. Every metric must be reported: Sharpe, return, win rate, profit factor, drawdown, and OOS validation. No hand-waving.

What this week covers

This week teaches the four required metrics for any backtest (Sharpe, annual return, win rate, profit factor), how to interpret an equity curve, the critical distinction between in-sample and out-of-sample results, and the three backtest sins to avoid. Section 4 is where you report everything.

The four required metrics

1. Sharpe Ratio

\[ \text{Sharpe} = \frac{E[R_p] - R_f}{\sigma_p} \cdot \sqrt{252} \]

Annualized return per unit of risk. Multiply by √252 for daily data.

Interpretation:

  • < 0.5: Weak edge
  • 0.5 – 1.0: Acceptable (borderline)
  • 1.0 – 1.5: Strong edge
  • 1.5 – 2.0: Very strong edge (check for overfitting)
  • > 2.0: Almost certainly overfit

Critical caveat: In-sample Sharpe is always higher than live Sharpe. A backtest Sharpe of 1.5 means a live Sharpe of ~1.0–1.2. Assume 20–30% degradation.

2. Annualized Return

\[ R_{\text{ann}} = \left(\prod_{t=1}^{T}(1 + r_t)\right)^{252/T} - 1 \]

Compound growth rate annualized. Report gross (before costs) and net (after transaction costs). If they differ by > 1–2%, transaction costs are material.

3. Win Rate & Profit Factor

Win Rate: Percentage of profitable trades.

\[ \text{Win Rate} = \frac{\text{# winning trades}}{\text{# total trades}} \]

Profit Factor: Ratio of gross profit to gross loss.

\[ \text{Profit Factor} = \frac{\sum \text{winning P&L}}{\sum |\text{losing P&L}|} \]

Why both matter: Win rate alone is meaningless. A strategy that wins 30% of the time can be profitable if winners are 3.5x the size of losers. A strategy that wins 70% of the time can lose money if losers are 2.5x the size of winners.

  • Profit Factor > 1.0: Strategy is profitable
  • Profit Factor > 1.5 + Win Rate > 50%: Strong edge
  • Profit Factor > 2.0 + Win Rate > 40%: Very strong edge (check for overfitting)

4. Equity Curve & Drawdown

The equity curve is the cumulative P&L over time. It shows HOW you got to the final return, not just that you did. Look for:

In-sample vs. out-of-sample

In-sample: Data used to develop the strategy (optimize parameters, train models).

Out-of-sample: Data the researcher never saw before backtest.

In-sample Sharpe is almost always higher than OOS Sharpe. This is not a bug — it's mathematics. You optimized to the in-sample data.

Walk-forward validation (best practice)

  1. Training period (first 60%): Optimize signal parameters, train model
  2. Validation period (next 20%): Test parameter robustness (tune if needed, but don't reopt to OOS data)
  3. Hold-out period (final 20%): Completely untouched until final evaluation. This is your OOS Sharpe.

Report rule: Always report both IS and OOS metrics. If OOS << IS, you overfit. Example: "In-sample Sharpe 1.45, hold-out Sharpe 0.92" suggests 36% degradation (normal). "In-sample Sharpe 2.1, hold-out Sharpe 0.3" suggests severe overfitting.

The three backtest sins

Sin 1: Look-ahead bias

Using future data to generate signals for the past. The most common mistake.

Example: Using the month's final close on the 1st of the month to generate signals for trades on the 1st. You're using data from the future (end of month) to trade at the beginning.

How to avoid: Use only data available at decision time. If your signal fires on day T, use only data available on day T (market close or earlier).

Sin 2: Overfitting via parameter sweeps

Testing 1,000 combinations of lookback window, threshold, and entry/exit logic. Reporting the best result. By chance, something will look great.

Math: If you test n parameter combinations, the expected maximum Sharpe by chance is roughly 0.5 + 0.1√log(n). For 100 combinations: ~0.9 expected by noise alone.

How to avoid: (1) Specify signal parameters in advance based on economic theory. (2) If you must do parameter search, test on walk-forward OOS data, not in-sample. (3) Report Deflated Sharpe Ratio: corrects for multiple testing.

Sin 3: Ignoring transaction costs

Every fill has slippage. Every trade has commission. For high-frequency signals, costs can eliminate all alpha.

commission_bps = 2   # 2 basis points per side
slippage_bps = 3     # 3 basis points estimated slippage
total_cost_bps = commission_bps + slippage_bps

# Round-trip cost (open and close): 2 × total_cost
df['net_return'] = df['gross_return'] - (df['trade_flag'] * 2 * total_cost_bps / 10000)

Rule of thumb: If your Sharpe drops by > 20% after adding transaction costs, the edge is marginal.

Section 4 template

SECTION 4: BACKTEST RESULTS — Corn Futures Strategy

4.1 Performance Metrics (2010–2024 in-sample)

  • Sharpe Ratio (gross): 1.07
  • Sharpe Ratio (net, 5bps round-trip): 0.98
  • Annualized Return (gross): 14.07%
  • Annualized Volatility: 13.17%
  • Maximum Drawdown: -16.37%
  • Sortino Ratio: 1.76
  • Calmar Ratio: 0.86
  • Win Rate: 54.35%
  • Profit Factor: 1.36
  • Total Trades: 487

4.2 Out-of-Sample Validation

Hold-out period: 2023–2024 (final 1/7 of dataset, untouched during development)

  • Sharpe Ratio (OOS): 0.87 (18% degradation from IS)
  • Annualized Return (OOS): 10.8%
  • Max Drawdown (OOS): -12.1%
  • Win Rate (OOS): 51.2%

Degradation is normal and expected. The OOS Sharpe of 0.87 is solid.

4.3 Risk Analysis

Sortino (1.76) > Sharpe (1.07) indicates downside volatility is well-managed — most volatility comes from upside swings. Calmar (0.86) suggests moderate drawdown recovery. Maximum single drawdown of -16.37% lasted approximately 4 months (recovery by month 5).

4.4 Transaction Cost Sensitivity

Assuming 5bps round-trip cost (commission + slippage), net Sharpe = 0.98. At 10bps cost, Sharpe drops to 0.88. Edge survives transaction costs.

Common mistakes

Five backtest reporting failures

  • Reporting in-sample Sharpe as if it's out-of-sample. Always label it: "In-sample Sharpe 1.45, hold-out OOS Sharpe 0.92." Readers assume the better number unless told otherwise.
  • Not showing the equity curve. A single Sharpe number hides everything: path to returns, drawdown recovery speed, regime changes. Always include the equity curve chart in Section 4.
  • Optimizing to the hold-out period. If you tune parameters to 2023 data and report 2023 results as OOS, you haven't validated anything. OOS data must be truly untouched.
  • Forgetting to annualize metrics. Daily Sharpe is ~16x lower than annualized. Always specify the basis.
  • Ignoring transaction costs or using unrealistically low estimates. Use 5bps minimum for round-trip (commission + slippage + opportunity cost). For illiquid assets, add 10–20bps.
← Week 8: Model Assumptions Week 10: Alternative Data →