IP Anchor Section 1 — Hypothesis This week IS Section 1. Submit the template below as your IP Section 1.

What this week covers

A strong hypothesis has three components: (1) an economic mechanism explaining why the inefficiency exists, (2) the predicted relationship between signal and return, and (3) a falsifiability condition. This week teaches you to write all three, and walks through hypothesis types with real examples.

The three components of a strong hypothesis

1. Economic mechanism

Why does this pattern exist? Not "I found it in the data." Why would market participants allow this inefficiency to persist?

Good mechanisms explain:

  • Behavioral: What bias or constraint causes participants to misbehave? (anchoring, disposition effect, herding, underreaction)
  • Structural: What market constraint forces suboptimal behavior? (hedging requirements, index rebalancing, roll yield, storage costs)

2. Predicted relationship

What signal predicts what return, in what direction, over what horizon?

Example: "When 30-day rolling soil moisture deviation is more than 1 standard deviation below the 20-year seasonal average, corn futures are expected to have negative returns over the next 5–10 trading days because low soil moisture predicts lower yields, which the market will reprice when USDA yield reports are released."

Specify:

  • The signal (soil moisture deviation)
  • The direction (negative / low moisture → lower returns)
  • The horizon (5–10 trading days)
  • The mechanism link (yields → prices)

3. Falsifiability condition

What result would convince you the hypothesis is wrong? If you can't answer this, you don't have a hypothesis.

Example: "This hypothesis is falsified if (a) the Information Coefficient between soil moisture deviation and forward corn returns is less than 0.02 (no predictive power), or (b) the backtest Sharpe on a 2023 hold-out period is less than 0.5 (results don't hold out-of-sample)."

Good falsifiability conditions are:

  • Testable: You can calculate it from data
  • Specific: Explicit number, not "seems promising"
  • Fair: Not unreasonably high (Sharpe 5.0) or low (0.1)

Hypothesis template

Copy this template into Section 1 of your IP. Fill in each section with your hypothesis.

HYPOTHESIS TEMPLATE

Economic mechanism:

[Market participants systematically _____ because _____, which causes prices to _____ .]

Predicted relationship:

[When [signal] is [high/low/rising/falling], [asset] returns are expected to be [positive/negative] over [horizon], because [mechanism above].]

Falsifiability:

[This hypothesis would be rejected if [specific quantitative condition — e.g., Information Coefficient < 0.02, or Sharpe < 0.5 on hold-out period, or directional accuracy < 55% on out-of-sample data].]

Hypothesis types with examples

Type 1: Behavioral inefficiency

Example: Post-earnings drift (momentum)

Economic mechanism: Investors systematically underreact to earnings news. The market reprices gradually over weeks or months, not immediately at announcement. Short-selling constraints and institutional limitations on volatility exposure prevent arbitrageurs from eliminating this drift immediately.

Predicted relationship: When earnings surprise is large and positive (actual EPS > consensus EPS by more than 1 standard deviation), the stock is expected to have positive abnormal returns for 3–6 months post-announcement.

Falsifiability: This hypothesis is rejected if the average post-announcement drift is not significantly different from zero (t-stat < 1.96) over a 3-month hold period, or if the drift reverses (negative returns) in the first week after announcement.

Type 2: Structural inefficiency

Example: Commodity roll yield (contango/backwardation)

Economic mechanism: Commodity futures contracts are rolled before expiry. When the term structure is in backwardation (near-month contract trading at a premium to far-month), rolling captures positive yield. This is not an arbitrage — it reflects the physical convenience value of holding inventory. Hedgers (producers and consumers) are willing to pay this premium because holding physical commodity has value. The premium is persistent because the underlying convenience value doesn't disappear.

Predicted relationship: When the front/second-month spread is in backwardation (front price > second-month price), a rolling long position in the front-month contract is expected to earn positive roll yield equal to the spread minus the cost of financing. Expected return is positive.

Falsifiability: This hypothesis is rejected if average roll yield is zero or negative over a 5-year period, or if roll yield does not persist after transaction costs.

Type 3: Alternative data (information latency)

Example: Satellite weather data predicting crop yields

Economic mechanism: NASA satellite data provides daily measurements of soil moisture and temperature at the field level. These measurements predict crop yields. The market prices crops based on USDA forecasts, which are survey-based and release once per month. There is an information gap: satellite data predicts yields before USDA reports them. The market does not instantaneously incorporate satellite data because (a) the data is not in standard market feeds, (b) processing it requires domain expertise, (c) few market participants have access or motivation to use it.

Predicted relationship: When growing-season soil moisture anomalies (satellite GDD deviation from 20-year seasonal average) are large and negative, corn prices are expected to decline in the 5–10 days preceding the next USDA yield report, capturing the market's repricing as the report data becomes known.

Falsifiability: This hypothesis is rejected if the Information Coefficient between satellite moisture deviation and forward corn returns is < 0.02, or if the strategy has negative Sharpe on a 2023 hold-out period.

Information Coefficient (IC)

IC measures the correlation between your signal and forward returns. It's the core metric for evaluating whether your hypothesis has predictive power before you build the full backtest.

\[ IC = \text{Corr}(\text{signal}_t,\ r_{t+h}) \]

Where signal_t is your signal at time t, r_{t+h} is the return from t to t+h (your chosen horizon).

Interpretation:

IC Information Ratio (ICIR):

\[ ICIR = \frac{\overline{IC}}{\sigma_{IC}} \]

Average IC divided by the standard deviation of IC. ICIR > 0.5 suggests a consistently predictive signal.

Common mistakes

Five hypothesis failures

  • Writing the hypothesis as a tautology. "When momentum is positive, the stock continues to have positive momentum." This is circular — of course it's true. A real hypothesis explains WHY.
  • No falsifiability condition. If you can't state what result would make you reject the hypothesis, you don't have a hypothesis. Add a specific quantitative threshold.
  • Economic mechanism stated too vaguely. "The market is inefficient" doesn't distinguish from data mining. "Investors underreact to earnings by an average of 6 weeks due to limited attention" is specific enough to test.
  • Signal direction not specified before testing. "I'll look at momentum and see if it predicts returns." Wrong. "High momentum predicts positive returns over 1-month horizon" specified before you test.
  • Hypothesis changed after seeing the data. If you test 10 signal variants and then form a hypothesis around the best one, you're p-hacking. The hypothesis must come before the backtest.
← Week 4: Toolstack Week 6: Data Sourcing →