In this post I will go from making individual match predictions using the Bradley-Terry model through to predicting the Gold medal winners of the track cycling Individual Sprint, and derive an optimal betting strategy based on a spread-bet Kelly Criterion optimised under posterior uncertainty.
This is the third post in a series: Click for links.
Tokyo 2020 Betting I: Predictive Models for Pairwise Matches
Tokyo 2020 Betting II: Model Refinement and Feature Engineering
Tokyo 2020 Betting III: From Matches to Medals… and Bookies (This Post)Due to a lag in drafting vs analysis time, this post was originally published with betting stakes only. I have retrospectively added the detail of how these stakes were derived.
So far in this series I’ve focused on predicting the outcome of single matches between two athletes, and derived a bespoke Bradley-Terry model for this purpose.
To construct a betting strategy I will need to turn the probability that a given rider wins a single match, into the probability that they will win the whole tournament - and hence the gold medal.
In the first section of this post I introduce the tournament format for the Tokyo 2020 Individual Sprint, and how the outputs of the Bradley-Terry model are used to derive a distribution for the gold medal winner.
In the second part I will introduce the Kelly Criterion, and derive generalisations that allow for multiple outcome bets, handling posterior uncertainty, and accounting for additional caution.
At the end of the post I’ve included the log of the bets that I ended up placing, that were originally published in a holding post whilst the event was running.
This post makes some assumptions about you!
I’ll assume you’ve read the previous posts in the series, though this post should work as a standalone.
I’ll also assume you have some knowledge of basic betting terminology (fractional odds, stakes). To derive my betting strategy I will use some discrete probability, and formalise a non-linear optimisation model.
If you’re interested in reading the underlying code, this is in R..
The Individual Sprint has a complex tournament structure, which will see the winning athlete compete between 10 and 16 sprints before they can claim the medal!
There are four main parts to the tournament, which in Tokyo 2020 will see 30 athletes compete:
Overview
A qualifying round that sees all athletes competing individually to set the fastest time, with the six slowest athletes eliminated.
1/32, 1/16 and 1/8 Finals that see the athletes compete in pairs to win a single sprint. The winner automatically qualifies for the next round (eg. 1/32 Finals winners qualify for 1/16 Finals).
Repechage races that see the losers from the previous round competing to take take any remaining places in the next round (eg. losers of 1/32 Finals compete for four remaining places in the 1/16 Finals).
Tokyo 2020 Summary
Round | Athletes Competing | Matches x Athletes per Match | Sprints per Match | Athletes Qualifying |
---|---|---|---|---|
Qualifying | 30 | 30 x 1 | 1 | 24 |
1/32 Finals | 24 | 12 x 2 | 1 | 12 |
Repechage 1 | 12 | 4 x 3 | 1 | 4 |
1/16 Finals | 16 | 8 x 2 | 1 | 8 |
Repechage 2 | 8 | 4 x 2 | 1 | 4 |
1/8 Finals | 12 | 6 x 2 | 1 | 6 |
Repechage 3 | 6 | 2 x 3 | 1 | 2 |
Quarterfinals | 8 | 4 x 2 | Best of 3 | 4 |
Semifinals | 4 | 2 x 2 | Best of 3 | 2 |
Finals | 2 | 2 x 1 | Best of 3 | 1 |
Tokyo 2020 Detail The tables below provide the detail that determines which riders face each other in each round; its adapted from the table published by the UCI in their Track Regulations.
The initial rider codes N1-N24 are in order of the time posted in the qualifying round: N1 is the fastest qualifier, N24 the slowest.
1/32 Finals
round | match_no | rider_code_1 | rider_code_2 | rider_code_3 | winner_code | loser_code | sprints |
---|---|---|---|---|---|---|---|
1/32 Finals | 1 | N1 | N24 | NA | 1A1 | 1A2 | 1 |
1/32 Finals | 2 | N2 | N23 | NA | 2A1 | 2A2 | 1 |
1/32 Finals | 3 | N3 | N22 | NA | 3A1 | 3A2 | 1 |
1/32 Finals | 4 | N4 | N21 | NA | 4A1 | 4A2 | 1 |
1/32 Finals | 5 | N5 | N20 | NA | 5A1 | 5A2 | 1 |
1/32 Finals | 6 | N6 | N19 | NA | 6A1 | 6A2 | 1 |
1/32 Finals | 7 | N7 | N18 | NA | 7A1 | 7A2 | 1 |
1/32 Finals | 8 | N8 | N17 | NA | 8A1 | 8A2 | 1 |
1/32 Finals | 9 | N9 | N16 | NA | 9A1 | 9A2 | 1 |
1/32 Finals | 10 | N10 | N15 | NA | 10A1 | 10A2 | 1 |
1/32 Finals | 11 | N11 | N14 | NA | 11A1 | 11A2 | 1 |
1/32 Finals | 12 | N12 | N13 | NA | 12A1 | 12A2 | 1 |
Repechage 1
round | match_no | rider_code_1 | rider_code_2 | rider_code_3 | winner_code | loser_code | sprints |
---|---|---|---|---|---|---|---|
Repechage 1 | 13 | 1A2 | 8A2 | 9A2 | 1B | NA | 1 |
Repechage 1 | 14 | 2A2 | 7A2 | 10A2 | 2B | NA | 1 |
Repechage 1 | 15 | 3A2 | 6A2 | 11A2 | 3B | NA | 1 |
Repechage 1 | 16 | 4A2 | 5A2 | 12A2 | 4B | NA | 1 |
1/16 Finals
round | match_no | rider_code_1 | rider_code_2 | rider_code_3 | winner_code | loser_code | sprints |
---|---|---|---|---|---|---|---|
1/16 Finals | 17 | 1A1 | 4B | NA | 1C1 | 1C2 | 1 |
1/16 Finals | 18 | 2A1 | 3B | NA | 2C1 | 2C2 | 1 |
1/16 Finals | 19 | 3A1 | 2B | NA | 3C1 | 3C2 | 1 |
1/16 Finals | 20 | 4A1 | 1B | NA | 4C1 | 4C2 | 1 |
1/16 Finals | 21 | 5A1 | 12A1 | NA | 5C1 | 5C2 | 1 |
1/16 Finals | 22 | 6A1 | 11A1 | NA | 6C1 | 6C2 | 1 |
1/16 Finals | 23 | 7A1 | 10A1 | NA | 7C1 | 7C2 | 1 |
1/16 Finals | 24 | 8A1 | 9A1 | NA | 8C1 | 8C2 | 1 |
Repechage 2
round | match_no | rider_code_1 | rider_code_2 | rider_code_3 | winner_code | loser_code | sprints |
---|---|---|---|---|---|---|---|
Repechage 2 | 25 | 1C2 | 8C2 | NA | 1D1 | NA | 1 |
Repechage 2 | 26 | 2C2 | 7C2 | NA | 2D1 | NA | 1 |
Repechage 2 | 27 | 3C2 | 6C2 | NA | 3D1 | NA | 1 |
Repechage 2 | 28 | 4C2 | 5C2 | NA | 4D1 | NA | 1 |
1/8 Finals
round | match_no | rider_code_1 | rider_code_2 | rider_code_3 | winner_code | loser_code | sprints |
---|---|---|---|---|---|---|---|
1/8 Finals | 29 | 1C1 | 4D1 | NA | 10 | 100 | 1 |
1/8 Finals | 30 | 2C1 | 3D1 | NA | 20 | 200 | 1 |
1/8 Finals | 31 | 3C1 | 2D1 | NA | 30 | 300 | 1 |
1/8 Finals | 32 | 4C1 | 1D1 | NA | 40 | 400 | 1 |
1/8 Finals | 33 | 5C1 | 8C1 | NA | 50 | 500 | 1 |
1/8 Finals | 34 | 6C1 | 7C1 | NA | 60 | 600 | 1 |
Repechage 3
round | match_no | rider_code_1 | rider_code_2 | rider_code_3 | winner_code | loser_code | sprints |
---|---|---|---|---|---|---|---|
Repechage 3 | 35 | 100 | 400 | 500 | 1F1 | NA | 1 |
Repechage 3 | 36 | 200 | 300 | 600 | 2F1 | NA | 1 |
Quarterfinals
round | match_no | rider_code_1 | rider_code_2 | rider_code_3 | winner_code | loser_code | sprints |
---|---|---|---|---|---|---|---|
Quarterfinals | 37 | 10 | 2F1 | NA | 1G1 | NA | 2 |
Quarterfinals | 38 | 20 | 1F1 | NA | 2G1 | NA | 2 |
Quarterfinals | 39 | 30 | 60 | NA | 3G1 | NA | 2 |
Quarterfinals | 40 | 40 | 50 | NA | 4G1 | NA | 2 |
Semifinals
round | match_no | rider_code_1 | rider_code_2 | rider_code_3 | winner_code | loser_code | sprints |
---|---|---|---|---|---|---|---|
Semifinals | 41 | 1G1 | 4G1 | NA | 1H1 | 1H2 | 2 |
Semifinals | 42 | 2G1 | 3G1 | NA | 2H1 | 2H2 | 2 |
Finals
round | match_no | rider_code_1 | rider_code_2 | rider_code_3 | winner_code | loser_code | sprints |
---|---|---|---|---|---|---|---|
Finals | 43 | 1H1 | 2H1 | NA | Gold | Silver | 2 |
Finals | 44 | 1H2 | 2H2 | NA | Bronze | NA | 2 |
To forecast the gold medal winner I will simulate results for each match in the tournament, using the detailed tournament structure tables above.
In short, simulating the tournament will involve:
Using the rider qualifying times to pair riders based on the details in the 1/32 Finals table (in the Tokyo 2020 Detail tab above).
Sample the winner/loser of each match using the Bernoulli distribution that is implied by the Bradley-Terry model (recapped below). Assign each rider the appropriate winner/loser code, from the detailed table.
Repeat the above for each of the successive rounds, until the Gold medal winner is decided.
That sounds simple enough, but there are a few things for us to unpack here.
The Bradley-Terry model assumes that in a match between athletes \(r\) and \(s\) then
\[\mathbf P[r \text{ beats } s] = \frac{\beta_r}{\beta_r + \beta_s}.\]
Given a set of parameters \((\beta_r)\) I can use the formula above to sample the winner of any given match, as required in step two above.
The previous posts have focused on estimating the parameters \(\beta_r\), with the final model taking the form \(\beta_r = \exp \left( \alpha_r^{(m)} + \kappa t_r \right)\), where
\(\alpha_r^{(m)}\) is the estimated athlete strength going into the match, \(m\), taking into account time varying effects and a home advantage (in the case of Tokyo only affecting the two Japanese competitors).
\(t_r\) is the athlete’s qualifying time in the current competition, and along with the estimated coefficient \(\kappa\) this allows us to take into account the athlete’s current form.
Sampling a tournament is just a case of sampling match outcomes, and then using the detailed tournament information in the tabs above to identify who to pair in the next round of matches. The animation below gives an example of this dynamic.