Understand what alternative data is, where to find it, and how to validate whether it contains real signal. Know the pitfalls before building a strategy.
Alternative data is any non-standard data source that provides an information edge. This week walks through three major categories (satellite, positioning, sentiment), real examples from AlgoGators, and the key validation steps before committing research time to an alternative data hypothesis.
Standard data: OHLCV prices, fundamentals (earnings, balance sheets), economic releases.
Alternative data: Anything else — satellites, credit card transactions, web scraping, shipping data, options flow, weather.
The four questions to ask about any alternative dataset:
Data source: NASA POWER API. Free, public. Daily data back to 1981.
Variables: Solar radiation (ALLSKY_SFC_SW_DWN), temperature (T2M, T2M_MAX, T2M_MIN), precipitation (PRECTOTCORR), humidity.
Geographic coverage: Any latitude/longitude on Earth. You request the grid.
Information advantage: Crop yields depend on growing season weather. USDA yield forecasts are survey-based and release monthly (dates announced in advance). Satellite data provides daily field-level conditions. Market doesn't instantaneously incorporate satellite data because (a) not in standard feeds, (b) requires domain expertise to process, (c) few participants use it.
Signal: Growing-degree-day (GDD) deviation from 20-year seasonal average. When cumulative GDD is abnormally low relative to the 10-year average for the calendar week, crop stress is high. This predicts lower USDA yields.
Validation: Information Coefficient between satellite GDD deviation and corn yield surprises (actual USDA yield - prior month forecast). IC > 0.05 = meaningful predictive power.
Cost & access: Free. No licensing required. Ingestible via Python requests → database. No budget approval needed.
Data source: CFTC Commitment of Traders reports. Free, published every Friday. Data lag: reporting Tuesday's positions on Friday.
Coverage: All major US futures (oil, gold, corn, wheat, currency, interest rates, equities).
Information advantage: COT breaks down positioning by trader category: commercial hedgers (producers, consumers), large speculators, small speculators. When large speculators are extremely net long (crowded trade), the position often reverses. When commercial hedgers are extremely short (hedging supply), the commodity is often at peak prices.
Signal: Speculator positioning zscore. When net spec positioning is > 2 std above the 20-year average (crowded long), short the commodity. When < -2 std (crowded short), go long. Contrarian signal.
Data lag risk: COT data is released Friday for Tuesday. You can't trade intraday Tuesday on Friday data — you trade Thursday/Friday on Friday release. This is acceptable for daily/weekly signals.
Cost & access: Free from CFTC website. Historical data via Bloomberg Terminal. Parseable via Python.
Data source: Options flow (put/call volume, open interest, implied volatility). Available from exchanges (CBOE, CME) or data providers (Databento, OptionsIntelligence).
Information advantage: Options traders are often informed (they're willing to pay gamma to express a view). Unusual put buying can precede declines. Skew in implied volatility surfaces can signal tail hedging demand.
Signal examples:
Cost & access: Included in Databento subscription (existing infrastructure).
Before committing 10 weeks to an alternative data hypothesis, validate:
Hypothesis 1: Satellite weather → corn futures
Growing-season soil moisture anomalies, measured via satellite, predict USDA yield report surprises. Market prices are set on survey data; satellite data arrives before surveys are finalized. Information lag of 2–4 weeks.
Hypothesis 2: Speculator crowding → FX reversals
When CFTC large speculators are positioned at extreme net-long (historical 95th percentile), the currency pair reverses within 2–8 weeks. Contrarian edge from crowded positioning.
Hypothesis 3: Options skew → equity tail protection
When put/call skew steepens (downside IV exceeds upside by > 2 vols), downside is overhedged. Market rebounds 3–10 days after peak skew. Mean-reversion in hedging demand.