Bad Bets — Methodology

Thesis Live

Bad Bets is a quantitative sports model with one rule: publish the receipts. Every pick is timestamped before games start. Every result is graded automatically from official stats APIs. Every win and every loss is logged in a public ledger. There is no cherry-picking, no "we had it earlier", no Telegram-screenshot ROI claims. If the model is wrong, the ledger says so the next morning.

We publish models across four sports:

MLB game picks v0.8 — Statcast + park + weather + lineup + umpire, plus rolling momentum, bullpen fatigue, and Bayesian shrinkage
NFL game picks v0.2 — Elo + QB-EPA + line movement signal
NBA game picks v0.1 — pace + rest + injury blend; live through opening night of the 2026-27 season
CFB game picks v0.1 — SP+ blend with portal-transfer adjustments; weekly during the regular season
MLB player props v0.3 — Poisson rate model on HR, hits, and pitcher strikeouts with Bayesian shrinkage, park, weather, lineup spot
NFL / NBA / CFB props Coming — passing/rushing/receiving (NFL), points/rebs/assts (NBA), and rushing/receiving (CFB) shipping with each league's regular season

Every model is open-loop: data in → math → edge out → wait for the actual outcome → grade → calibrate → adjust. No human discretion overrides the model.

Every pick card carries a "Why we like it" rail explaining which signals drove the edge (momentum, bullpen fatigue, umpire zone tilt, weather, park, etc.). Every game shows a projected final score from the model with a live pace tracker that compares the actual score to the projection in real time.

What's New in v0.8 Shipped

v0.8 is the biggest model update since launch. Four new signal layers stack on top of v0.7's Statcast-aware base, closing roughly 70% of the public-vs-sharp pricing gap on a free, no-cost-API data stack. Plus a wave of UX work around surfacing the model's reasoning to the user.

Model layers (new)

🔥 Momentum — last 10 games per team: streak, run differential, runs-per-game vs league. Adjusts each team's run estimate by ±8%.
💪 Bullpen fatigue — last 3 days of reliever pitch counts. A burned pen (≥240 pitches L3) inflates opponent runs by up to 6%. A rested pen (≤90 pitches L3) suppresses them.
🧑‍⚖️ Umpire tendency — home-plate umpire's K%/BB% delta vs league. Affects total runs by up to ±5%. Seeded with 20 active MLB umps; database grows nightly.
📐 Bayesian shrinkage — replaces v0.6's hard return 50.0 fallback. A team with only 50 PAs of fresh data gets blended 80% toward league average; a 600-PA team gets trusted at 75% of its own number. Kills fake-edge plays from small samples.

User-facing (new)

Projected final score on every pick card — rounded whole runs with the model's decimal projection underneath ("SF 5 · PROJECTED 4.8 – 5.6 · ATL 6").
Live pace tracker on every live tile — compares actual runs through inning N to the model's pre-game projection. "🔥 Tracking over · +2.2 vs proj" or "🥶 Tracking under · -1.4 vs proj".
Suspended-game handling — rain-delayed games no longer render as "LIVE · bot 2" at 7 AM. Shows "⏸ Suspended · resumes 2:05 PM" with the partial score preserved.
VS matchup modal — full-screen pitcher-vs-hitter breakdown, bullpen comparison, live momentum strip, accessible from any pick card.
Season leaders — top-25 leaderboards per sport per category with player photos and betting hooks. Fed by a nightly importer.
Live ticket tracking — real-time per-leg progress bar for any ticket the user logs.
PWA install — Bad Bets now runs as a phone "app" via Add to Home Screen. iOS safe-area handling tuned so the status bar never blocks the sign-in pill.

Reasoning, surfaced

Every pick card now shows a "Why we like it" rail with the three strongest reasons the model liked the bet. v0.8 added three new builders to this rail: 🔥 momentum ("ATL hot: 8-2 L10, 5.7 RPG, +5% model uplift"), 💪 bullpen ("CHC pen burned: 288 relief pitches L3 — favors COL runs"), and 🧑‍⚖️ umpire ("HP umpire Pat Hoberg pitcher-friendly zone, runs ↓2.3% vs league"). When these signals are decisive they take precedence over weather/park reasons.

MLB Game Model · v0.8 Live

A four-layer stack. Each layer wraps the previous one and adds a signal. v0.8 is the production version — v0.6 (base Pythag), v0.7 (Statcast xwOBA uplift), and v0.8 (momentum + bullpen + umpire + shrinkage) compose at runtime via Python monkey-patching, so we can A/B individual layers by flag.

Base inputs (v0.6 → v0.7)

Starting pitcher — FIP, K%, BB%, GB%, plus xwOBA-against (Statcast)
Pitcher fatigue — days rest + recent pitch counts
Offense — team wOBA, ISO, lineup wOBA (when lineup posted)
Park — per-stadium runs/HR factors from 3-yr FanGraphs aggregates
Weather — wind speed/direction relative to outfield bearing + temperature
Bullpen quality — opponent bullpen FIP weighted by expected reliever IP

v0.8 signal layers (new)

Momentum (mlb_momentum.py) — pulls last 10 games from MLB Stats API, computes wins/losses, streak, run differential, runs-per-game. Blends two signals (run-diff per game × 4%, RPG vs league × 5%), clamps to ±8%, multiplies the team's own run estimate.
Bullpen fatigue (mlb_bullpen_fatigue.py) — fetches last 3 days of box scores per team, sums reliever pitches (excluding starters). ≤90 pitches L3 = rested (mult 0.94 on opp runs); ≥240 = burned (mult 1.06). Linear interpolation between.
HP umpire (mlb_umpire.py) — looks up tonight's home-plate ump in data/umpire_tendencies.json. K%/BB% delta vs league gets converted to a runs multiplier (+1pp K% ≈ -2.3% runs, +1pp BB% ≈ +2.7% runs). Clamped to ±5%.
Bayesian shrinkage (mlb_shrinkage.py) — replaces v0.6's hard 50.0 fallback. shrink_score(observed, sample_size, prior=50, strength=80) blends each team's offense score and each pitcher's quality toward league average proportional to sample size. Strength 200 for offense (≈ half a month of PAs), 60 for pitchers (≈ 10 starts).

Math

runs_estimate = (offense_wOBA / league_wOBA) × base_runs × park_runs × weather × pitcher_suppression × momentum_mult × fatigue_mult × umpire_mult win_prob_home = R_home^1.83 / (R_home^1.83 + R_away^1.83) (Pythagenpat) edge = model_win_prob − fair_moneyline_prob

Outputs persisted per game

Each pick row in picks carries the projected away/home runs, model probabilities, edge%, calibrated edge%, direction (AGREE/FLIP), lineup status, Kelly units, and the v0.8 reasons JSONB.

Frequency

Morning slate fires at 10 AM ET (GitHub Actions slate-morning.yml). Re-runs when official lineups post (~4 PM ET, lineup-afternoon.yml) to incorporate confirmed batting orders. Graded automatically via pg_cron polling MLB Stats API every 15 minutes during play.

NFL Game Model · v0.2 Beta

Game-level moneyline + spread + total model. v0.2 is the first version to replace team-Elo as the primary feature with QB-level Expected Points Added (EPA). NFL is meaningfully harder than MLB to model because of sample size (17 games vs 162) and lineup turnover; we expect smaller, less-reliable edges than the MLB game model.

Inputs

QB-EPA — rolling 8-game per-play EPA for the starting QB (replaces team Elo)
Defensive EPA-allowed — opponent's per-play EPA-against, weighted by recent games
Line movement — direction of close vs open; sharp money usually drives early reverse moves
Home-field — fixed +1.7 point baseline, modulated by travel + altitude (Denver)
Rest — days since last game (Thursday/Monday/bye)

Roadmap

v0.3 (in progress) replaces the Elo+EPA blend with an XGBoost model trained on 15 years of nflverse play-by-play. v0.4 adds an LLM-as-feature-extractor layer that reads beat-reporter tweets and injury reports to surface signal not yet in the box score (illness, weather changes, late scratches).

NBA Game Model · v0.1 Beta

Pace-adjusted offensive/defensive ratings with an injury layer for star availability. NBA opens its real season in late October; until then the model runs as preseason with reduced confidence bands.

Inputs

Offensive rating — points per 100 possessions, 10-game rolling
Defensive rating — opponent points per 100, 10-game rolling
Pace — possessions per 48 — multiplies both team's expected scoring
Rest — days since last game (B2B nights drop offensive efficiency ≈2%)
Star availability — injury report integration; missing a 25%+ usage rate star drops team rating by ~4 pts
Home court — fixed +3.0 point baseline, slight altitude tweak for DEN/UTA

Math

exp_score_team = (team_ORtg × opp_DRtg / 100) × pace / 100 × rest_mult × availability_mult edge = model_win_prob − fair_market_prob

Roadmap

v0.2 adds player-level usage-adjusted ORtg/DRtg (currently team-level only) and a "blowout discount" so projected 30-point wins don't over-anchor moneyline edge. v0.3 adds 4-factor breakdown (eFG%, TOV%, ORB%, FTr).

CFB Game Model · v0.1 Beta

Built on Bill Connelly's SP+ public ratings as the base, blended with a portal transfer adjustment layer and a "team continuity" feature (returning starter %). Publishes weekly during the regular season; live during the 2026 season opens Labor Day weekend.

Inputs

SP+ rating — Connelly's adjusted efficiency rating (base feature)
Transfer portal — ingoing/outgoing transfers weighted by snap counts; elite portal hauls bump SP+ by up to ±4 points
Continuity — returning starter % vs prior season (low continuity teams tend to underperform preseason ratings until October)
Pace + style — plays per game and EPA per play
Home/away — fixed +2.5 point baseline; +1 extra for "iconic atmosphere" games (LSU at night, Beaver Stadium whiteout, etc.)
Spread / total markets — currently focused on spread and total; moneyline edge often vanishes when a 28-point favorite is priced -3500

Frequency

Weekly Wednesday refresh once Tuesday transfer portal moves clear. Live updates for major Friday injury news. Saturday morning edge sheet published 4 hours before first kickoff.

MLB Props Model · v0.3 Live

Player-level prop edges on three markets: home runs, hits, and pitcher strikeouts. Built on a Poisson rate model: each player has a projected per-game rate λ, and we compute the over/under probability against the line.

Markets

⚾ Home Runs — lines like 0.5, 1.5
🎯 Hits — lines like 0.5, 1.5, 2.5
🔥 Pitcher Strikeouts — lines from 3.5 (relievers) to 9.5 (aces)

Rate equations

HR rate = barrel_rate × HR_per_barrel × est_PA × park_HR × weather × pitcher_HR_adj Hits rate = xBA_shrunk × est_PA × ab_per_pa × park_BA × pitcher_BA × weather_BA K rate = base_K_per_PA × PA_per_IP × est_IP × pitcher_quality × opp_K_resistance × park_K

v0.3 improvements

The v0.1 model had a fatal flaw: tiny-sample players (Tommy Troy with 8 career PAs) showed up with +57% edges because their unstable xwOBA was treated at face value. v0.3 addresses this with five upgrades:

Bayesian shrinkage — every observed xwOBA / xBA is blended toward league average using a 200-PA prior. A 50-PA player ends up 20% themselves and 80% league avg. A 600-PA player ends up 75% themselves. This kills fake locks from hot starts.
Park factors — per-stadium HR/Hits/K multipliers from the mlb_park_factors module (Coors ×1.20 HR, Petco ×0.89 HR). Domed parks tagged as climate-controlled.
Weather — live wind speed/direction + temperature from mlb_weather (Open-Meteo). Wind component along the CF-bearing axis adjusts HR rate ±35%; temperature adjusts ±10%.
Lineup spot — uses real batting_order when populated (leadoff ~4.65 PA, 9-hole ~3.80 PA). Default 4.20 when order unknown.
Platoon scaffold — schema ready for LHB vs RHP / RHB vs LHP multipliers once pitcher_hand is populated in upstream lineup data.

Poisson conversion

P(over X.5) = 1 − Σ_(k=0..floor(X)) (e^-λ × λ^k / k!) e.g. P(over 0.5 HR) = 1 − e^-λ

Edge selection

edge_over = model_p_over − fair_market_over (Shin-devigged) edge_under = model_p_under − fair_market_under edge_side = whichever positive edge is larger

Lock tier

🔒 Lock — edge ≥ 12%
💪 Strong — edge 7-12%
👌 Solid — edge 4-7%
Lean — edge 2-4% (shown but not Discord-posted)
Skip — edge < 2% (filtered out)

Data sources

Lines — The Odds API consensus across US books, median-of-bookmakers per line for stability
Player metrics — Baseball Savant Statcast leaderboards (free CSV endpoint)
Game schedule + matchup — official MLB Stats API (statsapi.mlb.com)
Weather — Open-Meteo (free, no auth required)
Results — MLB Stats API live game feed, per-player box scores

Cadence

Lines refresh every 30 min from 10 AM to 6 PM ET. Model re-scores on every line refresh. Discord post fires once at 10:30 AM ET. Grader runs every 15 min from 10 PM to 2 AM ET as games finalize. Live grading also fires every 10 min during games so over-cashes settle before the final out.

NFL Props · v0.1 Coming Sep 2026

Shipping with kickoff week. Three markets to start:

Passing yards — QB rate model on YPA × expected attempts × opp pass-defense rating
Rushing yards — RB rate model on YPC × snap share × opp run-defense rating, weather-adjusted
Receiving yards — WR/TE target share × yards-per-target × opp coverage matchup

Underlying math is the same Poisson framework as MLB props with shrinkage, retuned for football. NFL is harder than MLB on small-sample plays (17 games vs 162) so we'll publish smaller edges with bigger confidence bands.

→ NFL Props page (preview)

NBA Props · v0.1 Coming Oct 2026

Ships with the 2026-27 season opener. Five markets:

Points — usage-rate × possessions × scoring efficiency × matchup adjustment
Rebounds — rebound rate × team pace × opponent miss rate
Assists — usage-share × team pace × team eFG% (assists scale with teammates making shots)
3-pointers made — 3PA × season 3P% with shrinkage (volatile market — bigger Bayesian prior)
Double-doubles — joint Poisson on points + rebounds (or points + assists for guards)

Pre-season we publish a single "preseason calibration slate" each week to test the rate model against actual outputs before live betting starts.

→ NBA Props page (preview)

CFB Props · v0.1 Coming Sep 2026

Smaller market than NFL/NBA props but exploitable because books trade these softer (fewer pro bettors care about Ole Miss tight ends). Markets:

Passing yards — QB rate × expected attempts × tempo × opp pass defense
Rushing yards — RB rate × snap share × game-script (favored teams run more)
Receiving yards — WR target share × air yards per target × matchup
Anytime TD scorer — RB + WR + TE goal-line usage

CFB props are tough on Week 1-3 because rosters change so much over the off-season. We'll publish smaller slates early and ramp up by mid-September.

→ CFB Props page (preview)

Predicted Scores v0.8

Every MLB pick card now shows a projected final score: rounded whole runs for the scoreboard headline, with the model's decimal projection underneath. The projection is the same away_runs and home_runs the model uses to derive its win probability — just surfaced instead of thrown away.

What you see

SF 5 · PROJECTED 4.8 – 5.6 · ATL 6

Why decimals matter

Rounding 4.8 to 5 and 5.6 to 6 makes the scoreboard read like a real prediction, but the underlying decimal carries critical info — a projection of 5.1 vs 4.9 is basically a coin flip, while 6.5 vs 4.5 is a real model lean. The decimal sub-line lets users see uncertainty without us needing a separate confidence band.

Derived signals

Projected total — home_runs + away_runs compared against the over/under line for total-bet edges
Projected margin — |home_runs − away_runs| compared against the run-line (-1.5/+1.5) for spread bets
Live pace — see Live Game Tracking section

Honest accuracy expectations

Run-prediction error is wider than win-prob error. Industry-standard run-total RMSE is ±1.4 runs for the best public models. Bad Bets v0.8 currently lands at ~±1.6 RMSE; we expect that to drop to ±1.5 once we have 200+ graded games to calibrate against. Predicted exact score (e.g., 5-3) is essentially a party trick — even Vegas can't hit those. The distribution (over/under, total, margin) is where the math is solid.

Live Game Tracking v0.8

The live page renders every tonight's pick as a tile that updates every 30 seconds from the MLB Stats API. As the game progresses, the tile shows runs, inning, base state, count, last play, and three v0.8-only enhancements:

Pace tracker

Each live tile compares the actual runs through the current inning to the model's pre-game projection.

expected_through_inning_N = projected_total × (N − 1 + (isTopInning ? 0.5 : 1.0)) / 9 delta = actual_total − expected_through_inning_N if delta > +0.6: 🔥 TRACKING OVER · +N.N vs proj if delta < -0.6: 🥶 TRACKING UNDER · -N.N vs proj otherwise: 📍 ON PACE · proj total N.N

±0.6 runs is the dead zone (statistical noise). The formula assumes a roughly uniform run distribution across innings — MLB scoring is slightly bunched in the 1st and late innings, but linear is close enough for vibe-check purposes.

Suspended-game handling

The MLB Stats API marks suspended games (rain delay mid-game) as abstractGameState: "Live" with detailedState: "Suspended". Without a special case, the site would render "LIVE · BOT 2" at 7 AM the next day for a game that resumes at 2 PM. v0.8 added a shared BB_effectiveGameState() resolver that maps:

detailedState: "Suspended" → ⏸ SUSPENDED · resumes [time]
detailedState: "Postponed" → ❌ POSTPONED
detailedState: "Delayed Start" → ⏸ DELAYED · [time]

Suspended games drop out of the live chat picker but keep their partial score (the 0-0 in the bottom of the 2nd is part of the story when it resumes).

Live ticket tracking

Any user-logged ticket gets a real-time progress bar with per-leg status: PENDING → LIVE → CASH or MISS. Driven by Supabase realtime subscriptions; MLB game state polls every 20s.

"Why We Like It" Reasons v0.8

Every pick card carries a structured reason rail. Each reason is a {icon, text, weight, sign} dict, persisted to the reasons JSONB column. The composer (mlb_reasons.py) runs after every slate sync and regenerates them so the bullets always reflect the current model version.

Reason builders (ranked)

🎯 Edge — always #1 if the bet has an edge. Shows raw + calibrated edge%.
⚾ Run projection — model's projected final score when the delta is decisive (≥0.4 runs).
🔥 Momentum (v0.8) — fires when the bet side is hot (≥+4%) or cold (≤-4%) over last 10. Shows W-L, RPG, streak.
💪 Bullpen fatigue (v0.8) — fires when the OPPONENT's pen is burned or rested. Shows L3 reliever pitch count.
🧑‍⚖️ Umpire (v0.8) — fires when HP umpire's K/BB delta moves runs ≥1.5%. Pulled from a hand-seeded 20-ump database.
🌬️ Weather — wind direction × outfield bearing + temperature.
🏟️ Park — fires for parks with factor ≥110 (hitter) or ≤90 (pitcher).
🧭 Direction — AGREE (book + model aligned) or FLIP (model disagrees with book).
👥 Lineup — confirmed / partial / unconfirmed.
💰 Kelly — when the model wants real exposure (≥0.5 units).
⚖️ Calibration — when the recent-form calibration factor is unusually aggressive or conservative.

Composer logic

Builders iterate in rank order. Edge is always pinned at #1. The remaining slots get sorted by weight (strong > moderate > lean) so the most decisive non-edge reasons win, regardless of position in the builder list.

Honest framing

Reasons are framed around the bet side specifically. We only praise when a signal favors our pick, only warn when it's against us. No fluff — if momentum is neutral, the bullet doesn't fire and weather/park fills the slot instead.

Grading & Calibration Live

A model is only as good as its receipts. We grade every pick automatically from official MLB / NFL stat APIs the moment games finalize. Results land in the public ledger pages and inform future calibration.

Game pick grading

MLB game picks are graded by pg_cron polling statsapi.mlb.com every 15 minutes during play. NFL picks are graded the morning after each Sunday/Monday slate via a parallel pg_cron job pulling nflverse data.

Prop grading

The props grader (mlb_props_grade.py) fetches each completed game's box score and extracts per-player HR / hits / pitcher_strikeouts counts. For each prop row matching (slate_date, player_name):

actual_value = HR | hits | K count from box score result_side = 'over' if actual > line, 'under' if actual < line, 'push' if == edge_correct = (edge_side == result_side) | null on push

Calibration

Calibration is the question: when the model says 60% over, does it hit 60% of the time? The Props Ledger bucketizes decided picks by model probability and shows actual hit rate per bucket. If the green bar sits at or below the amber tick, the model is honest. If green is systematically below amber, the model is overconfident and the Kelly stake sizing should be shrunk.

Once we have 200+ decided picks (≈ 2 weeks of data), we plan to apply isotonic regression to remap raw model probabilities to calibrated probabilities. Same model, way better edges.

Risk Management Live

Every published unit count uses fractional Kelly with a 0.25× cap. Full-Kelly is mathematically optimal for compound growth but practically insane — a string of bad luck can wipe a bankroll. The 0.25× cap means we bet a quarter of what full Kelly would suggest, trading slightly slower expected growth for dramatically less variance.

Kelly formula

f* = (bp − q) / b where p = model win probability q = 1 − p b = decimal odds − 1 units = round(0.25 × f* × 100) // expressed as "units" with 1u = 1% bankroll

Practical caps

Max 5.0 units on any single play, regardless of stated edge
Any play with edge < 4% is hidden (not "solid") to avoid noise plays
Any play with model_prob > 95% or < 5% is dropped (model probably broken on that row)

Bankroll discipline

1 unit = 1% of your bankroll. If you have $1,000, 1u = $10. If we publish "3.2u" we mean 3.2% of your roll. The numbers ONLY work if you scale them to YOUR bankroll. Betting flat $100 per play when we publish wildly different unit recommendations will destroy ROI.

Glossary Live

Terms you'll see across the site, defined precisely.

xwOBA: "Expected weighted on-base average" — what a batter's wOBA should have been based on launch angle + exit velocity, stripping out defense and luck. League avg ≈ .315. The single most predictive offensive metric in public baseball analytics.
xBA: Expected batting average from Statcast. Same idea as xwOBA but just for batting average. League avg ≈ .245.
Barrel rate: % of batted balls hit at the optimal launch-angle / exit-velocity combo for a HR. League avg ≈ 8.5%. Strongest leading indicator of power outbursts.
FIP: "Fielding Independent Pitching" — pitcher rating from K, BB, HBP, HR only (ignores balls in play). Strips out defense + park noise. Lower = better.
Poisson: Probability distribution for "rare events at a known average rate". HR per game, K per start, and hits per game all approximately follow Poisson once you know λ.
Devig: Removing the bookmaker's juice (vig) from posted odds to recover the implied probability the book is asserting. Standard methods: proportional, power, and Shin. We use Shin.
Edge %: edge = model_prob − fair_market_prob. A +5% edge means the model thinks the true probability is 5 percentage points higher than the devigged market price. NOT the same as ROI per bet.
Kelly: Optimal bet-sizing formula for compound bankroll growth. We use 0.25× Kelly (quarter Kelly) for variance reduction. 1u = 1% bankroll.
Bayesian shrinkage: Blending a player's observed stat with the league prior, weighted by sample size. A 50-PA player's xwOBA is mostly noise, so we pull it heavily toward league average. A 600-PA player has signal, so we trust it more.
Park factor: Multiplier for how a stadium affects a stat relative to league average. Coors Field's 1.20 HR factor means HRs are hit 20% more often there.
Run type: Tag indicating WHEN a pick was generated: morning (pre-lineup), lineup (after official lineups), injury (mid-day update), final (closest-to-tipoff).
Lock / Strong / Solid: Tier thresholds on edge magnitude. Lock ≥12%, Strong ≥7%, Solid ≥4%. Higher tier = larger model edge AND larger Kelly stake.
Momentum (v0.8): A team's run-differential and runs-per-game over its last 10 games, relative to league average. Drives a ±8% multiplier on the team's own projected runs. Hot teams hit better in the near term; cold teams keep cooling for a few more games before reverting.
Bullpen fatigue (v0.8): Total reliever pitches thrown over the last 3 days. ≤90 = rested, 91-239 = light/normal/heavy, ≥240 = burned. Burned pens give up runs at a higher rate when forced into duty the next day, so a burned opponent pen multiplies the bet side's projected runs by up to 1.06.
Umpire tendency (v0.8): Home-plate umpire's K% and BB% delta vs the league average. Pitcher-friendly umps (positive K delta, negative BB delta) tighten the zone → fewer runs. Hitter-friendly umps do the opposite. Bounded ±5%.
Bayesian shrinkage strength: How many observations would equal one observation of the prior. For offense we use strength=200 (≈ half a month of PAs). A team with 50 PAs ends up 50/(50+200) = 20% themselves and 80% league average. A 600-PA team ends up 75% themselves.
Pace tracker (v0.8): Live in-game comparison of actual runs to the model's projected runs through the current inning, assuming linear scoring across 9 innings. Surfaces "tracking over / under / on pace" on every live tile.
Projected final score: Rounded whole-run version of the model's away_runs and home_runs output. The decimal projection is shown underneath as the honest signal.
Effective game state: UI-friendly mapping of MLB's raw game state. Suspended, Postponed, and Delayed Start get their own pills instead of being mis-displayed as "LIVE" with stale inning data.
Direction (AGREE / FLIP): AGREE = the model and the market favor the same team. FLIP = the model disagrees with the market. AGREEs are historically thicker edges (when both sharp money and our model land on the same side); FLIPs are thinner and tagged as risk.

Limitations & What We Won't Do Honest

A model that pretends to be perfect is a model you can't trust. Here's where ours has gaps and where you should expect bumps.

Known weaknesses

Sample size on small markets — a player with under 100 career PAs has unstable Statcast; even with shrinkage, edges on them are noisier than on established players.
Reliever Ks — pitcher_strikeouts model assumes starter usage; relievers get scored but their proj_IP estimate is rough.
Late lineup changes — if a star is scratched at 6:55 PM, we may have published a pre-lineup pick that becomes stale by first pitch.
Doubleheaders — second game of a DH has compounded fatigue + bullpen shifts the model doesn't fully capture yet.
Postseason — model is trained on regular-season patterns; relief usage and lineup management shift in October.

Things we will never do

Delete losing picks. Every result, win or loss, stays in the ledger.
Backfill "we had this earlier" claims. All picks are timestamped before games start. If we didn't publish it pre-game, it doesn't count.
Charge for picks. The picks are free. Always.
Sell guaranteed locks. No such thing exists in sports betting.
Recommend chasing losses. If the model has a bad week, the answer is not "bigger bets to make it back." The answer is unit discipline + waiting.

Sportsbook adaptation

Books are sharp and adapt fast. The +15% edges visible today on heavily-bet markets will likely tighten to +5% within months as books adjust their pricing. The durable strategy is constantly moving toward less-efficient markets — alt lines, lesser-traded players, AAA call-ups, weird game-time changes. We'll add those over time and call out when we do.

21+ / responsible play

Bad Bets is model output, not betting advice. Sports betting is legal only in select jurisdictions and only for 21+. If betting stops being fun, call 1-800-GAMBLER.

Methodology every model, every input, every receipt — explained

Thesis Live

What's New in v0.8 Shipped

Model layers (new)

User-facing (new)

Reasoning, surfaced

MLB Game Model · v0.8 Live

Base inputs (v0.6 → v0.7)

v0.8 signal layers (new)

Math

Outputs persisted per game

Frequency

NFL Game Model · v0.2 Beta

Inputs

Roadmap

NBA Game Model · v0.1 Beta

Inputs

Math

Roadmap

CFB Game Model · v0.1 Beta

Inputs

Frequency

MLB Props Model · v0.3 Live

Markets

Rate equations

v0.3 improvements

Poisson conversion

Edge selection

Lock tier

Data sources

Cadence

NFL Props · v0.1 Coming Sep 2026

NBA Props · v0.1 Coming Oct 2026

CFB Props · v0.1 Coming Sep 2026

Predicted Scores v0.8

What you see

Why decimals matter

Derived signals

Honest accuracy expectations

Live Game Tracking v0.8

Pace tracker

Suspended-game handling

Live ticket tracking

"Why We Like It" Reasons v0.8

Reason builders (ranked)

Composer logic

Honest framing

Grading & Calibration Live

Game pick grading

Prop grading

Calibration

Risk Management Live

Kelly formula

Practical caps

Bankroll discipline

Glossary Live

Limitations & What We Won't Do Honest

Known weaknesses

Things we will never do

Sportsbook adaptation

21+ / responsible play