Stats the receipts, as they accumulate
Recent trend
Active calibration
The model has a learning loop: every night it re-reads the log and computes how much to shrink
each segment's edges based on observed hit rates. A factor of 1.00 means raw model output;
anything below means the segment has historically underperformed enough that the published edge gets
discounted. Bounded at 0.50 minimum — we can shrink, never inflate.
Lineup-aware coverage — afternoon re-run
The morning slate uses team-level offense. At 5 PM ET we re-fetch each game's posted lineup and rebuild the offense number from the actual hitters (weighted by batting order, with L/R splits against the opposing starter). Picks can change. This table tracks coverage + how often the lineup re-run flipped the model's pick from morning.
| computing... |
Rolling model win %
AGREE vs FLIP — the audit thesis
AGREE = model and market favour the same side; the model just thinks the favourite should be bigger. FLIP = they disagree on which team is the favourite. If FLIP plays systematically underperform AGREE, the edge filter is leaking, and we know what to fix.
Hit rate & ROI by edge bucket
All ROI numbers assume flat staking at -110 (52.4% breakeven). Wilson 95% confidence intervals printed alongside — small samples have very wide ranges and shouldn't be over-read.
Calibration curve
When the model says a team has a 60-65% chance to win, does that team actually win ~60-65% of the time? A bar below the predicted line = overconfident in that band. Above = underconfident.