Methodology every model, every input, every receipt — explained

Thesis Live

Bad Bets is a quantitative sports model with one rule: publish the receipts. Every pick is timestamped before games start. Every result is graded automatically from official stats APIs. Every win and every loss is logged in a public ledger. There is no cherry-picking, no "we had it earlier", no Telegram-screenshot ROI claims. If the model is wrong, the ledger says so the next morning.

We publish three model categories:

  • MLB game picks — full-game moneyline edges driven by Statcast + park + weather + lineup + umpire (v0.7)
  • NFL game picks — Elo + QB-EPA + line movement signal (v0.2)
  • MLB player props — Poisson rate model on HR, hits, and pitcher strikeouts with Bayesian shrinkage, park, weather, lineup spot (v0.3)

Every model is open-loop: data in → math → edge out → wait for the actual outcome → grade → calibrate → adjust. No human discretion overrides the model.

MLB Game Model · v0.7 Live

Each morning we project both teams' run totals, convert to win probabilities via Pythagorean expectation, then compare against the devigged consensus moneyline to find the edge.

Inputs

  • Starting pitcher — FIP, K%, BB%, GB%, plus xwOBA-against (Statcast)
  • Pitcher fatigue — days rest + recent pitch counts
  • Offense — team wOBA, ISO, lineup wOBA (when lineup posted)
  • Park — per-stadium runs/HR factors from 3-yr FanGraphs aggregates
  • Weather — wind speed/direction relative to outfield bearing + temperature
  • Umpire — strike-zone tightness and HBP propensity nudges
  • Bullpen — opponent bullpen FIP weighted by expected reliever IP

Math

runs_estimate = (offense_wOBA / league_wOBA) × base_runs × park_runs × weather × pitcher_suppression win_prob_home = R_home² / (R_home² + R_away²) (Pythag exponent 1.83) edge = model_win_prob − fair_moneyline_prob

Frequency

Morning slate fires at 8 AM ET (GitHub Actions). Re-runs when official lineups post (~2-4 PM ET) to incorporate confirmed batting orders. Graded automatically via pg_cron the moment each game finalizes.

NFL Game Model · v0.2 Beta

Game-level moneyline + spread + total model. v0.2 is the first version to replace team-Elo as the primary feature with QB-level Expected Points Added (EPA). NFL is meaningfully harder than MLB to model because of sample size (17 games vs 162) and lineup turnover; we expect smaller, less-reliable edges than the MLB game model.

Inputs

  • QB-EPA — rolling 8-game per-play EPA for the starting QB (replaces team Elo)
  • Defensive EPA-allowed — opponent's per-play EPA-against, weighted by recent games
  • Line movement — direction of close vs open; sharp money usually drives early reverse moves
  • Home-field — fixed +1.7 point baseline, modulated by travel + altitude (Denver)
  • Rest — days since last game (Thursday/Monday/bye)

Roadmap

v0.3 (in progress) replaces the Elo+EPA blend with an XGBoost model trained on 15 years of nflverse play-by-play. v0.4 adds an LLM-as-feature-extractor layer that reads beat-reporter tweets and injury reports to surface signal not yet in the box score (illness, weather changes, late scratches).

MLB Props Model · v0.3 Live

Player-level prop edges on three markets: home runs, hits, and pitcher strikeouts. Built on a Poisson rate model: each player has a projected per-game rate λ, and we compute the over/under probability against the line.

Markets

  • ⚾ Home Runs — lines like 0.5, 1.5
  • 🎯 Hits — lines like 0.5, 1.5, 2.5
  • 🔥 Pitcher Strikeouts — lines from 3.5 (relievers) to 9.5 (aces)

Rate equations

HR rate = barrel_rate × HR_per_barrel × est_PA × park_HR × weather × pitcher_HR_adj Hits rate = xBA_shrunk × est_PA × ab_per_pa × park_BA × pitcher_BA × weather_BA K rate = base_K_per_PA × PA_per_IP × est_IP × pitcher_quality × opp_K_resistance × park_K

v0.3 improvements

The v0.1 model had a fatal flaw: tiny-sample players (Tommy Troy with 8 career PAs) showed up with +57% edges because their unstable xwOBA was treated at face value. v0.3 addresses this with five upgrades:

  • Bayesian shrinkage — every observed xwOBA / xBA is blended toward league average using a 200-PA prior. A 50-PA player ends up 20% themselves and 80% league avg. A 600-PA player ends up 75% themselves. This kills fake locks from hot starts.
  • Park factors — per-stadium HR/Hits/K multipliers from the mlb_park_factors module (Coors ×1.20 HR, Petco ×0.89 HR). Domed parks tagged as climate-controlled.
  • Weather — live wind speed/direction + temperature from mlb_weather (Open-Meteo). Wind component along the CF-bearing axis adjusts HR rate ±35%; temperature adjusts ±10%.
  • Lineup spot — uses real batting_order when populated (leadoff ~4.65 PA, 9-hole ~3.80 PA). Default 4.20 when order unknown.
  • Platoon scaffold — schema ready for LHB vs RHP / RHB vs LHP multipliers once pitcher_hand is populated in upstream lineup data.

Poisson conversion

P(over X.5) = 1 − Σ_(k=0..floor(X)) (e^-λ × λ^k / k!) e.g. P(over 0.5 HR) = 1 − e^-λ

Edge selection

edge_over = model_p_over − fair_market_over (Shin-devigged) edge_under = model_p_under − fair_market_under edge_side = whichever positive edge is larger

Lock tier

  • 🔒 Lock — edge ≥ 12%
  • 💪 Strong — edge 7-12%
  • 👌 Solid — edge 4-7%
  • Lean — edge 2-4% (shown but not Discord-posted)
  • Skip — edge < 2% (filtered out)

Data sources

  • LinesThe Odds API consensus across US books, median-of-bookmakers per line for stability
  • Player metrics — Baseball Savant Statcast leaderboards (free CSV endpoint)
  • Game schedule + matchup — official MLB Stats API (statsapi.mlb.com)
  • Weather — Open-Meteo (free, no auth required)
  • Results — MLB Stats API live game feed, per-player box scores

Cadence

Lines refresh every 30 min from 10 AM to 6 PM ET. Model re-scores on every line refresh. Discord post fires once at 10:30 AM ET. Grader runs every 15 min from 10 PM to 2 AM ET as games finalize.

Grading & Calibration Live

A model is only as good as its receipts. We grade every pick automatically from official MLB / NFL stat APIs the moment games finalize. Results land in the public ledger pages and inform future calibration.

Game pick grading

MLB game picks are graded by pg_cron polling statsapi.mlb.com every 15 minutes during play. NFL picks are graded the morning after each Sunday/Monday slate via a parallel pg_cron job pulling nflverse data.

Prop grading

The props grader (mlb_props_grade.py) fetches each completed game's box score and extracts per-player HR / hits / pitcher_strikeouts counts. For each prop row matching (slate_date, player_name):

actual_value = HR | hits | K count from box score result_side = 'over' if actual > line, 'under' if actual < line, 'push' if == edge_correct = (edge_side == result_side) | null on push

Calibration

Calibration is the question: when the model says 60% over, does it hit 60% of the time? The Props Ledger bucketizes decided picks by model probability and shows actual hit rate per bucket. If the green bar sits at or below the amber tick, the model is honest. If green is systematically below amber, the model is overconfident and the Kelly stake sizing should be shrunk.

Once we have 200+ decided picks (≈ 2 weeks of data), we plan to apply isotonic regression to remap raw model probabilities to calibrated probabilities. Same model, way better edges.

Risk Management Live

Every published unit count uses fractional Kelly with a 0.25× cap. Full-Kelly is mathematically optimal for compound growth but practically insane — a string of bad luck can wipe a bankroll. The 0.25× cap means we bet a quarter of what full Kelly would suggest, trading slightly slower expected growth for dramatically less variance.

Kelly formula

f* = (bp − q) / b where p = model win probability q = 1 − p b = decimal odds − 1 units = round(0.25 × f* × 100) // expressed as "units" with 1u = 1% bankroll

Practical caps

  • Max 5.0 units on any single play, regardless of stated edge
  • Any play with edge < 4% is hidden (not "solid") to avoid noise plays
  • Any play with model_prob > 95% or < 5% is dropped (model probably broken on that row)

Bankroll discipline

1 unit = 1% of your bankroll. If you have $1,000, 1u = $10. If we publish "3.2u" we mean 3.2% of your roll. The numbers ONLY work if you scale them to YOUR bankroll. Betting flat $100 per play when we publish wildly different unit recommendations will destroy ROI.

Glossary Live

Terms you'll see across the site, defined precisely.

xwOBA
"Expected weighted on-base average" — what a batter's wOBA should have been based on launch angle + exit velocity, stripping out defense and luck. League avg ≈ .315. The single most predictive offensive metric in public baseball analytics.
xBA
Expected batting average from Statcast. Same idea as xwOBA but just for batting average. League avg ≈ .245.
Barrel rate
% of batted balls hit at the optimal launch-angle / exit-velocity combo for a HR. League avg ≈ 8.5%. Strongest leading indicator of power outbursts.
FIP
"Fielding Independent Pitching" — pitcher rating from K, BB, HBP, HR only (ignores balls in play). Strips out defense + park noise. Lower = better.
Poisson
Probability distribution for "rare events at a known average rate". HR per game, K per start, and hits per game all approximately follow Poisson once you know λ.
Devig
Removing the bookmaker's juice (vig) from posted odds to recover the implied probability the book is asserting. Standard methods: proportional, power, and Shin. We use Shin.
Edge %
edge = model_prob − fair_market_prob. A +5% edge means the model thinks the true probability is 5 percentage points higher than the devigged market price. NOT the same as ROI per bet.
Kelly
Optimal bet-sizing formula for compound bankroll growth. We use 0.25× Kelly (quarter Kelly) for variance reduction. 1u = 1% bankroll.
Bayesian shrinkage
Blending a player's observed stat with the league prior, weighted by sample size. A 50-PA player's xwOBA is mostly noise, so we pull it heavily toward league average. A 600-PA player has signal, so we trust it more.
Park factor
Multiplier for how a stadium affects a stat relative to league average. Coors Field's 1.20 HR factor means HRs are hit 20% more often there.
Run type
Tag indicating WHEN a pick was generated: morning (pre-lineup), lineup (after official lineups), injury (mid-day update), final (closest-to-tipoff).
Lock / Strong / Solid
Tier thresholds on edge magnitude. Lock ≥12%, Strong ≥7%, Solid ≥4%. Higher tier = larger model edge AND larger Kelly stake.

Limitations & What We Won't Do Honest

A model that pretends to be perfect is a model you can't trust. Here's where ours has gaps and where you should expect bumps.

Known weaknesses

  • Sample size on small markets — a player with under 100 career PAs has unstable Statcast; even with shrinkage, edges on them are noisier than on established players.
  • Reliever Ks — pitcher_strikeouts model assumes starter usage; relievers get scored but their proj_IP estimate is rough.
  • Late lineup changes — if a star is scratched at 6:55 PM, we may have published a pre-lineup pick that becomes stale by first pitch.
  • Doubleheaders — second game of a DH has compounded fatigue + bullpen shifts the model doesn't fully capture yet.
  • Postseason — model is trained on regular-season patterns; relief usage and lineup management shift in October.

Things we will never do

  • Delete losing picks. Every result, win or loss, stays in the ledger.
  • Backfill "we had this earlier" claims. All picks are timestamped before games start. If we didn't publish it pre-game, it doesn't count.
  • Charge for picks. The picks are free. Always.
  • Sell guaranteed locks. No such thing exists in sports betting.
  • Recommend chasing losses. If the model has a bad week, the answer is not "bigger bets to make it back." The answer is unit discipline + waiting.

Sportsbook adaptation

Books are sharp and adapt fast. The +15% edges visible today on heavily-bet markets will likely tighten to +5% within months as books adjust their pricing. The durable strategy is constantly moving toward less-efficient markets — alt lines, lesser-traded players, AAA call-ups, weird game-time changes. We'll add those over time and call out when we do.

21+ / responsible play

Bad Bets is model output, not betting advice. Sports betting is legal only in select jurisdictions and only for 21+. If betting stops being fun, call 1-800-GAMBLER.