The only test I trust
Picking winners is the easy part — the betting price already knows who's likely to win. The real question is whether a model can be sharper than the price.
So the test isn't whether it picks winners. It's whether it can beat the closing line, the last price before kickoff. By then the price has absorbed every public model and all the sharp money, which makes it about as accurate as a price gets. Beat that, or there's no edge to find.
I wrote the pass/fail rule down before running anything, so I couldn't move the goalposts: the model passes only if, on games it never trained on, it can systematically get ahead of the closing price — measured by CLV, whether the price drifts toward the model's calls. A strict test, and a fair one.
Built the simplest model — Elo
One rating per team, learned only from past results, dates, and venue. No players, no injuries. A Davidson draw model turns rating gaps into home/draw/away odds. It only sees games in date order, so it can't peek at the future.
The club benchmark
Walk-forward over 24,359 matches in the five big European leagues, betting value picks at Pinnacle's pre-close price and grading against its close.
Pivoted to the World Cup
Rebuilt for national teams with neutral-venue handling. Tested the hardest case — the knockout stage (binary, no draws) — training on everything else.
Checked it against real money
Pulled live 2026 World Cup prices from Kalshi & Polymarket and measured CLV from Kalshi's own minute-by-minute price history.
Stress-tested every disagreement
A multi-agent research pass took the 12 games where the model disagreed most with the market and asked, with real team news: is the market wrong, or does it know something our score-only model doesn't?
Clubs: football-data.co.uk — results + Pinnacle closing odds, 5 leagues, ~14 seasons. Internationals: martj42/international_results — every national-team game since 1872 (~49k), plus 2026 fixtures and penalty shootouts. Live markets: Kalshi per-match moneylines + price history, and Polymarket.