What data does the F1 model use?

The model ingests race results, sprint results, qualifying times, and circuit metadata from the Ergast/Jolyon API. It uses data from 2003 onward for Elo priors and from the current season for feature computation.

How many simulations does the model run?

The production model runs 10,000 Monte Carlo simulations per prediction cycle. Each simulation plays out every remaining race with randomized outcomes weighted by driver strength.

How often is the model updated?

The computation job runs every 6 hours. Since F1 data only changes after race weekends, most mid-week cycles produce identical results. Post-race updates appear within one cycle of official results.

Does the model account for weather?

Yes. Each circuit preview includes a wet-risk flag derived from historical weather data at that venue. Wet conditions affect DNF probabilities and can shift win probabilities toward drivers with strong wet-weather records.

F1 Championship Model Methodology

The Odds Reference F1 model predicts championship outcomes by combining Elo ratings, circuit-specific historical data, and Monte Carlo simulation. It processes race results from 2003 onward and produces driver-level championship probabilities after each completed round.

What Data Sources Feed the Model?

The model ingests three primary datasets, all derived from official FIA timing data:

Dataset	Records	Key Fields
Race results	~10,000 entries (2003-present)	driver, constructor, grid, finish, status, circuit
Qualifying	~8,000 entries (2003-present)	driver, constructor, Q1/Q2/Q3 times
Sprint results	~400 entries (2021-present)	driver, constructor, grid, finish

Circuit metadata (type classification, historical weather) is derived from lap-time analysis and external weather APIs. The model stores processed features in Parquet format for fast iteration.

How Does Feature Engineering Work?

Raw timing data transforms into race-prediction features through several pipelines:

Driver strength features:

Combined Elo rating (driver + constructor components)
Elo momentum (3-race rolling delta)
Season points and gap to championship leader

Circuit features:

Historical average finish at this specific circuit
Best-ever finish at this circuit
Number of previous starts at this circuit
Circuit type percentile (performance at similar circuit types: power, high downforce, street, mixed)

Race context features:

Grid position and front-row indicator
Qualifying gap to pole (milliseconds)
Teammate qualifying delta
DNF probability per driver-constructor-circuit combination

The championship standings display the output features alongside the raw data so readers can see what drives each driver’s probability.

How Does the Monte Carlo Simulation Work?

After feature computation, the model simulates the remaining season thousands of times:

For each remaining race, convert driver Elo and circuit features into per-driver win probabilities
Draw race outcomes using weighted random sampling with noise to model race-day variance
Award FIA points (25-18-15-12-10-8-6-4-2-1 plus sprint points where applicable)
Accumulate standings across all remaining rounds
Record the champion for each simulation

The championship probability for each driver equals their win frequency across all simulations. Noise scaling is calibrated so that the simulation’s predicted finishing distributions match historical variance at each position.

How Are Teammate Deltas Computed?

Teammate pace comparison uses qualifying lap times, which isolate driver performance better than race results (where strategy, traffic, and incidents add noise). For each constructor:

Season delta: mean qualifying time difference across all rounds
Rolling 4-race delta: recent-form comparison using only the last four qualifying sessions
Pace ratio: proportional time difference (1.002 means the slower driver is 0.2% off)

These deltas appear on the F1 dashboard for all ten constructors.

How Is Model Quality Assessed?

Model calibration is evaluated using several metrics:

Brier score: measures probabilistic accuracy against binary outcomes (did the predicted champion actually win?)
Model-market correlation: alignment between our Elo-derived odds and Kalshi market prices
Per-position calibration: does a driver given 20% win probability actually win ~20% of the time across historical backtests?

The model runs backtests against completed seasons to verify that circuit-aware predictions outperform circuit-agnostic baselines.

What Are the Model’s Limitations?

No model captures everything. Known blind spots include:

Regulation changes: Major aerodynamic or engine rule shifts (like 2022 ground effect) break historical patterns
Driver transfers: When a driver moves to a new constructor, the model relies on constructor Elo until enough new data accumulates
Reliability: Mechanical DNFs are modeled probabilistically, but rare catastrophic failures (engine blowups, gearbox issues) are inherently unpredictable
Development rate: In-season car upgrades shift constructor performance, but the model only sees results, not upgrade schedules

Key Takeaways

The model combines 20+ years of F1 data with Elo ratings, circuit history, and Monte Carlo simulation
Circuit-specific features prevent one dominant venue from inflating a driver’s championship odds at every future race
Teammate deltas from qualifying isolate driver skill from car performance
The model runs every 6 hours and is validated via backtests, Brier scores, and market correlation
Live output is available on the F1 championship dashboard