Infrequently Frequentist: Tokyo 2020 Betting II: Model Refinement and Feature Engineering

Last time I introduced the basic model I’ll be using to predict the results of the track cycling Individual Sprint at Tokyo 2020.

This post will pick up where that ended, with the focus turning to building on the basic model: adapting it to handle more of the specifics of the sport.

The table below summarises the changes I’ll explore:

Post	Model	Description
Tokyo 2020 Betting I	bt1	Basic Bradley-Terry
Tokyo 2020 Betting I	bt2	Informative (Gamma) Prior
Tokyo 2020 Betting II	bt3	Match Likelihood
	bt4	Home advantage
	bt5	Time Dependent Strengths
	bt6	Qualifying Times

I’ll present the model developments in the order I made them - which matches the above - rather than the level of impact they have on evaluation metrics. In the conclusion I’ll reflect on how I could have improved my approach to prioritising the developments.

Assumptions

This post makes some assumptions about you!

I’ll assume you’ve read the previous post in the series (though this might hold-up as a stand-alone): I won’t recap the data or basic model.

The model variants will rely on a few technical tools that I’ll recap, but not fully introduce:

the role of the likelihood in regression models,
hierarchical (random) effects, and
Gaussian processes.

If you’re interested in reading the underlying code, this is in R and Stan.

The Match Likelihood

The Bradley-Terry model assumes matches are won or lost, with no delineation of the scale of a win.

In the early heats of the individual sprint this is an accurate assumption: the format is a knockout where a single sprint is contested.

From the quarterfinals, matches are contested in a best-of-three format which introduces a scale to winning: if $p$ is the probability that a given athlete wins a single sprint, the outcomes that they win a best-of-three match are:

Scenario	Sprints Contested	Prob. Occurrence
Wins first two sprints outright	2	$p^2$
Wins first and third, loses second	3	$p(1-p)p$
Loses first, wins second and third	3	$(1-p)p^2$
Total		$p^2 + 2p^2(1-p)$

Show code

p_best_of_three <- ggplot(tibble(p=c(0,1)), aes(p)) +
  
  stat_function(
    aes(linetype = '2', color = '2'),
    fun = function(p){ p^2  },
    size=1, color = infreq_palette["darkblue"]
  ) +
  
  stat_function(
    aes(linetype = '3', color = '3'),
    fun = function(p){(2*p^2*(1-p))},
    size=1, color = infreq_palette["darkblue"]
  ) + 
  
  stat_function(
    aes(linetype = 'Total', color = 'Total'),
    fun = function(p) p^2 + 2*p^2*(1-p) ,
    size = 2, color = infreq_palette["orange"]
  )  +
  
  geom_text(
    data = tribble(~p, ~y, ~text, ~color,
            0.9, 0.60,'2 sprints', 'Total',
            0.9, 0.31, '3 sprints','3',
            0.92, 1.07, 'Total', '2'),
    aes(p,y,label=text),
    color = c(infreq_palette["darkblue"], infreq_palette["darkblue"], infreq_palette["orange"]),
    size = c(5,5,6)
  ) +
  
  scale_y_continuous(limits=c(0,1.2), breaks = seq(0,1.2,by=0.2)) +
  coord_cartesian(ylim = c(0,1.1)) +
  labs(x="Prob. Wins Sprint", y="Prob. Wins Match") +
  scale_linetype_manual(values = c("solid", "dotted", "solid")) +
  scale_color_manual(values = c(infreq_palette["darkblue"],infreq_palette["darkblue"],infreq_palette["orange"])) +
  theme(legend.position = "none")

p_best_of_three

The best-of-three format can be considered fairer than a single sprint, as it increases the probability that the stronger athlete wins.

I’ll adapt the Bradley-Terry model to use a custom likelihood: in early rounds this will be as before, but for later rounds it will take into account the best-of-three format.

One convenience I’ll introduce is to formulate the model from the perspective of the winning rider. In words the likelihood I’ll consider is:

Given that the winning athlete, $w$, beat the loser, $l$, in $S$ sprints: how likely is it that the strength difference between them is $\delta = \alpha_w - \alpha_l$?

To simplify the formula for the likelihood I’ll let $p_\delta = \text{logit}^{-1}(\delta)$ which, as covered in the last post, is the probability of winning a single sprint given the strength difference $\delta$.

The match likelihood is then defined as:

\[ L_{\text{match}}\big(\alpha_w - \alpha_l = \delta \, |\, S = s\big) = \begin{cases} p_\delta, & s = 1 \\ \\ p_\delta^2, & s = 2\\ \\ 2p_\delta^2 (1-p), & s = 3 \end{cases} \]

The formula above is not a true likelihood per se, I’m using two convenient properties of the race format and the data, that allow this reduced definition:

Firstly, the case $s=1$ gets invoked on a separate subset of the data to the cases $s=2,\,3$. So really this is a separate likelihood (equivalent to the Bernoulli, logistic regression, model). Combining the two scenarios in one formula is a convenience as I can fit all the data to one model, rather than splitting out the cases.

Secondly, both likelihoods ($s=1$ case, and $s=2,3$) are presented from the perspective of the winning rider: i.e. we don’t have the cases $s=-1,-2,-3$. In reality these cases do exist - but by feeding the model data from the winning perspective the other scenarios won’t be induced.

BT3 \[ \begin{align*} \bf{\text{Priors}}\\ \sigma & \sim \text{Gamma}(80,60) \\ \alpha_r & \sim \text{Normal}(0,\sigma) \\ \\ \bf{\text{Likelihood}}\\ \alpha_w - \alpha_l | S &\sim L_{\text{match}}(\alpha_r - \alpha_s | S) \end{align*} \]

Example Data

Show code

# read match data from remote _target store
matches <- read_remote_target("matches")

matches_sample <-  matches %>%
  filter(event == '2020 UCI TRACK WORLD CYCLING CHAMPIONSHIPS', round == 'Finals', gender == 'Women') %>%
  mutate(across(everything(), as.character)) %>%
  select(event, date, gender, round, rider_1,rider_2,rider_id_1,rider_id_2, winner_id,loser_id, sprints) %>%
  slice(1)

matches_sample <- tibble(Field = names(matches_sample), Example = unlist(matches_sample[1,]))
matches_sample %>%
  kable("pipe") %>%
  kable_styling(full_width=FALSE, bootstrap_options = c("striped", "hover", "condensed"),font_size = 10)

Field	Example
event	2020 UCI TRACK WORLD CYCLING CHAMPIONSHIPS
date	2020-02-28
gender	Women
round	Finals
rider_1	HINZE EMMA
rider_2	VOINOVA ANASTASIIA
rider_id_1	44
rider_id_2	123
winner_id	44
loser_id	123
sprints	2

Stan code


functions {
  real match_logit_lpmf(int[] s, vector delta){
    real lpmf = 0;
    vector[num_elements(delta)] p = inv_logit(delta);
    
    for(n in 1:num_elements(s)){
      if (s[n] == 1) lpmf += log(p[n]);
      else if (s[n] == 2) lpmf += 2*log(p[n]);
      else lpmf += log(2) + 2*log(p[n]) + log(1-p[n]);
    }
    return lpmf;
  }
  
  real accuracy(vector delta, int start, int end){
    vector[end - start + 1] delta_seg = segment(delta, start, end - start + 1);
    real acc = 0;
    
    for(n in 1:num_elements(delta_seg)) acc += (delta_seg[n] > 0);
    
    return inv(num_elements(delta_seg)) * acc;
  }
  
  real log_loss(vector delta, int start, int end){
    vector[end - start + 1] delta_seg = segment(delta, start, end - start + 1); 
    real ll = 0;
    
    return inv(num_elements(delta_seg)) * bernoulli_logit_lpmf(1 | delta_seg);
  }
  
  real match_log_loss(int[] s, vector delta, int start, int end){
    int s_seg[end - start + 1]  = segment(s, start, end - start + 1);
    vector[end - start + 1] delta_seg = segment(delta, start, end - start + 1); 
    real mll = 0;
    
    return inv(num_elements(s_seg)) * match_logit_lpmf(s_seg | delta_seg);
  }
}

data {
  // Dimensions
  int<lower=0> R; // Riders
  int<lower=0> M; // Matches
  int<lower=0,upper = M> T; // Training matches
  
  // Match results
  int<lower=1,upper=R> winner_id[M]; // ID's specifying riders in match
  int<lower=1,upper=R> loser_id[M];
  int<lower=1,upper=3> sprints[M];
}

parameters {
  // rider ratings
  real<lower=0> sigma;
  vector[R] alpha0;
}

transformed parameters {
  // difference of winner and loser rating
  vector[M] delta =alpha0[winner_id] - alpha0[loser_id];
}

model {
  // (hierarchical) priors - strength
  sigma ~ gamma(80,60);
  alpha0 ~ normal(0,sigma);
  
  // likelihood
  head(sprints, T) ~ match_logit(head(delta, T));
}

generated quantities {
  // maximum theoretical strength difference between two riders
  real<lower=0> delta_max = max(alpha0) - min(alpha0);
  
  // Calculate log-loss, match log-loss and accuracy. A separate value is returned
  // for training/evaluation data, and within each of these the metrics are further
  // broken down by round (5 rounds in total) and the total (6th entry in vectors).
  real training_accuracy;
  real evaluation_accuracy;
  real training_match_log_loss;
  real evaluation_match_log_loss;

  training_accuracy = accuracy(delta,1, T);
  training_match_log_loss = match_log_loss(sprints, delta, 1,T);
  evaluation_accuracy = accuracy(delta, T+1, M);
  evaluation_match_log_loss = match_log_loss(sprints, delta, T+1, M);
}

The official list of track cyclists competing at Tokyo has now been published so we can create the strengths plot for the new model using the riders who are expected to take part.

Show code

riders <- read_remote_target("riders")

olympic_riders <- read_csv("olympic-athletes.csv") %>%
  mutate(Athlete = if_else(Athlete == 'ANTONOVA NATALIIA', 'ANTONOVA NATALIA', Athlete)) %>%
  inner_join(riders,.,  by = c("rider_name" = "Athlete"), keep = TRUE)

bt3_summ <- read_remote_target("bt_summary_bt3")

bt3_strength_summ <- bt3_summ %>%
  filter(str_detect(variable, 'alpha')) %>%
  mutate(rider_id = str_match(variable, '\\[(.*)\\]')[,2] %>% as.numeric()) %>%
  left_join(olympic_riders, .) %>%
  mutate(rider_name = fct_reorder(rider_name,median))

p_strengths_bt3 <- ggplot(bt3_strength_summ) +
  geom_segment(aes(x=q5,xend=q95,y=rider_name,yend=rider_name), color = infreq_palette["darkblue"]) +
  labs(y="",x="Strength, α")

p_strengths_bt3

There’s a compelling case for Lee Wai Sze to win gold - though this is based on the training data, so doesn’t include the latest results.

Match Performance Metrics

In light of changing the likelihood, its worth reviewing the evaluation metrics I’m using.

I can either measure accuracy by the proportion of separate sprints that are correctly predicted, or the proportion of matches. I’ll stick with match outcomes, since when applying the model this will be the measure of interest.

The log-loss metric is intrinsically linked to the likelihood of the model (its the average log likelihood of the data), so we should revise this to use the new likelihood. This will also need a revised benchmark.

For a best-of-three match there are four possible outcomes (either athlete winning in either 2 or 3 sprints), so a random guessing strategy has a log loss of $\log(\textstyle \frac14) \approx -1.39$.

The benchmark for the full data is a weighted average of the single sprint benchmark and the best-of-three. Note that this means that the match log loss benchmark differs between the training and evaluation splits.

Accuracy

Data

Model	Split	5%	50% (Median)	95%
bt1	evaluation	0.648	0.716	0.773
bt2	evaluation	0.648	0.716	0.773
bt3	evaluation	0.648	0.716	0.773
bt1	training	0.764	0.787	0.807
bt2	training	0.770	0.791	0.810
bt3	training	0.768	0.789	0.808

Match Log Loss

Data

Model	Split	5%	50% (Median)	95%
bt1	evaluation	-1.087	-0.919	-0.800
bt2	evaluation	-1.169	-0.951	-0.809
bt3	evaluation	-1.096	-0.928	-0.809
bt1	training	-0.642	-0.612	-0.586
bt2	training	-0.635	-0.603	-0.575
bt3	training	-0.612	-0.588	-0.567

Changing the likelihood has had a minimal impact on the evaluation metrics - with all three models defined to date performing similarly.

In hindsight this is to be expected as the volume of data that is evaluated differently under the revised likelihood is small: 86% of the training data are single sprints, for which both likelihoods agree, and only 2% have three sprints which is the scenario where the likelihoods differ the most.

I’ll carry this likelihood into the future models.

Home Advantage

A common adaptation of the Bradley-Terry model is to add a term to account for home advantage.

I didn’t have ready access to a list of each rider nationalities, but as a proxy I will use the nationality of the team they are riding for. I’ll consider a rider as being at home if the event is held in the country of their team.

The standard implementation of home advantage replaces the fixed strength $\alpha_r$, with a strength that varies by match:

\[ \alpha_{r}^{(m)} = \begin{cases} \alpha_r + \eta, &\text{ if $r$ is at home in match $m$,} \\ \alpha_r , & \text{otherwise.} \end{cases} \]

This assumes that all athletes feel the same benefitwhen competing at home, which I think is an over simplification. I’ll improve that assumption by using hierarchical modelling.

Hierarchical modelling allows us to vary the home advantage between riders, whilst accounting for the fact that we don’t have enough data to estimate each athlete’s effect independently.

The idea is to assume that there is some average effect, $\eta$, and that each rider then deviates from this average by some amount, $\theta_r$

\[\eta_r = \eta + \theta_r.\]

To help us define priors, I’ll consider the scenario where two riders with equal strength are competing, $\alpha_w - \alpha_l = 0$, and suppose one is at home. How much impact do we expect home advantage will have on the probability of winning?

No advantage would mean a win probability of 0.5, and I’d expect being at home might increase this to somewhere between 0.55 and 0.75. I can now turn that into a prior on $\eta$.

I also need a prior for the rider deviations $\theta_r$. Its reasonable to suppose this is normally distributed with mean 0, but we don’t know the appropriate standard deviation… and when in doubt, create a parameter!

I’ll let $\upsilon$ be this unknown standard deviation, so that $\theta_r \sim \text{Normal}(0,\upsilon)$, and put a prior on $\upsilon$.

Experimentation suggests priors $\eta \sim \Gamma(2,4)$ and $\upsilon \sim \text{HalfNormal}(0,0.2)$ align well to my heuristic above.

Parameter Priors

Show code

  home_effect_samples <- tibble(
    eta = rgamma(1e05, shape = 2, rate = 4),
    upsilon = abs(rnorm(1e05, mean = 0, sd =0.2))
  ) %>%
  mutate(
    p_win_avg = plogis(eta),
    theta = rnorm(n(), mean = 0, sd = upsilon),
    eta_theta = eta + theta,
    p_win_rider = plogis(eta_theta)
  )

home_param_prior <- home_effect_samples %>%
  # convert to long format 
  select(eta, theta) %>%
  gather(measure, sample, eta, theta) %>%
  mutate(measure = if_else(measure == "eta", 'Average', 'Rider Deviation'))

ggplot(home_param_prior, aes(sample, y = ..density.., fill = measure)) +
  geom_histogram(binwidth = 0.03, color = infreq_palette["beige"]) +
  # scale_x_continuous(breaks = seq(0, 1, by=0.1)) +
  coord_cartesian(xlim = c(-1,2)) +
  labs(x = "Home Advantage", y = "") +
  facet_grid(rows = vars(measure)) +
  theme(
    axis.line.y = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    strip.background = element_blank(),  strip.text.y = element_text(size = 14),
    legend.position = "none"
  )

Home Win Prior

Show code

home_wins_prior <- home_effect_samples %>%
  # convert to long format 
  select(p_win_avg, p_win_rider) %>%
  gather(measure, sample, p_win_avg, p_win_rider) %>%
  mutate(measure = if_else(measure == "p_win_avg", 'Average', 'Rider'))
  
ggplot(home_wins_prior, aes(sample, y = ..density.., fill = measure)) +
  geom_histogram(binwidth = 0.01, color = infreq_palette["beige"]) +
  scale_x_continuous(breaks = seq(0, 1, by=0.1)) +
  coord_cartesian(xlim = c(0,1)) +
  labs(x = "Probability Home Rider Wins", y = "") +
  facet_grid(rows = vars(measure)) +
  theme(
    axis.line.y = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    strip.background = element_blank(),  strip.text.y = element_text(size = 14)
  )

BT4

I’ll follow the convention I introduced above, denoting $\alpha_r^{(m)}$ for a strength that varies by match. Further I’ll let $\mathbf{1}_r^{(m)}$ denote the indicator that equals 1 if the rider, r, is at home in the match, m.

\[ \begin{align*} \bf{\text{Priors}} \\ \sigma & \sim \text{Gamma}(80,60) \\ \alpha_r & \sim \text{Normal}(0,\sigma) \\ \\ \eta & \sim \text{Gamma}(2,4) \\ \upsilon & \sim \text{Normal}(0, 0.2) \\ \theta_r & \sim \text{Normal}(0,\upsilon) \\ \\ \bf{\text{Likelihood}}\\ \alpha_r^{(m)} & = \alpha_r + (\eta + \theta_r)\mathbf{\large 1}_r^{(m)}\\ \alpha_w^{(m)} - \alpha_l^{(m)} | S & \sim L_{\text{match}}(\alpha_r^{(m)} - \alpha_s^{(m)} | S) \end{align*} \]

Example Data

Show code

# read match data from remote _target store
matches <- read_remote_target("matches")

matches_sample <-  matches %>%
  filter(event == '2020 UCI TRACK WORLD CYCLING CHAMPIONSHIPS', round == 'Finals', gender == 'Women') %>%
  mutate(across(everything(), as.character)) %>%
  select(event, date, gender, round, rider_1,rider_2,rider_id_1,rider_id_2, winner_id,loser_id, sprints, winner_at_home, loser_at_home) %>%
  slice(1)

matches_sample <- tibble(Field = names(matches_sample), Example = unlist(matches_sample[1,]))
matches_sample %>%
  kable("pipe") %>%
  kable_styling(full_width=FALSE, bootstrap_options = c("striped", "hover", "condensed"),font_size = 10)

Field	Example
event	2020 UCI TRACK WORLD CYCLING CHAMPIONSHIPS
date	2020-02-28
gender	Women
round	Finals
rider_1	HINZE EMMA
rider_2	VOINOVA ANASTASIIA
rider_id_1	44
rider_id_2	123
winner_id	44
loser_id	123
sprints	2
winner_at_home	1
loser_at_home	0

Stan code


functions {
  real match_logit_lpmf(int[] s, vector delta){
    real lpmf = 0;
    vector[num_elements(delta)] p = inv_logit(delta);
    
    for(n in 1:num_elements(s)){
      if (s[n] == 1) lpmf += log(p[n]);
      else if (s[n] == 2) lpmf += 2*log(p[n]);
      else lpmf += log(2) + 2*log(p[n]) + log(1-p[n]);
    }
    return lpmf;
  }
  
  real accuracy(vector delta, int start, int end){
    vector[end - start + 1] delta_seg = segment(delta, start, end - start + 1);
    real acc = 0;
    
    for(n in 1:num_elements(delta_seg)) acc += (delta_seg[n] > 0);
    
    return inv(num_elements(delta_seg)) * acc;
  }
  
  real log_loss(vector delta, int start, int end){
    vector[end - start + 1] delta_seg = segment(delta, start, end - start + 1); 
    real ll = 0;
    
    return inv(num_elements(delta_seg)) * bernoulli_logit_lpmf(1 | delta_seg);
  }
  
  real match_log_loss(int[] s, vector delta, int start, int end){
    int s_seg[end - start + 1]  = segment(s, start, end - start + 1);
    vector[end - start + 1] delta_seg = segment(delta, start, end - start + 1); 
    real mll = 0;
    
    return inv(num_elements(s_seg)) * match_logit_lpmf(s_seg | delta_seg);
  }
}

data {
  // Dimensions
  int<lower=0> R; // Riders
  int<lower=0> M; // Matches
  int<lower=0,upper = M> T; // Training matches

  // Match results
  int<lower=1,upper=R> winner_id[M]; // ID's specifying riders in match
  int<lower=1,upper=R> loser_id[M];
  int<lower=1,upper=3> sprints[M];
  
  // Home advantage effects
  real<lower=0,upper=1> winner_at_home[M];
  real<lower=0,upper=1> loser_at_home[M];
}

parameters {
  // rider ratings
  real<lower=0> sigma;
  vector[R] alpha0;
  
  // home advantage effect
  real<lower=0>eta;
  real<lower=0>upsilon;
  real theta[R];
}

transformed parameters {
  // difference of winner and loser rating
  vector[M] delta;
  real eta_theta[R];
  
  for(r in 1:R) eta_theta[r] = eta + theta[r];
  
  for(m in 1:M){
    
    delta[m] = 
      alpha0[winner_id[m]] - alpha0[loser_id[m]] +
      (winner_at_home[m] * eta_theta[winner_id[m]]) - (loser_at_home[m] * eta_theta[loser_id[m]]);
  }
}

model {
  // (hierarchical) priors - strength
  sigma ~ gamma(80,60);
  alpha0 ~ normal(0,sigma); 
  
  // (hierarchical) priors - home effect
  eta ~ gamma(2,4);
  upsilon ~ normal(0,0.2);
  theta ~ normal(0, upsilon);
  
  // likelihood
  head(sprints, T) ~ match_logit(head(delta, T));
}

generated quantities {
  // maximum theoretical strength difference between two riders
  real<lower=0> delta_max = max(alpha0) - min(alpha0);
  
  // Calculate log-loss, match log-loss and accuracy. A separate value is returned
  // for training/evaluation data, and within each of these the metrics are further
  // broken down by round (5 rounds in total) and the total (6th entry in vectors).
  real training_accuracy;
  real evaluation_accuracy;
  real training_match_log_loss;
  real evaluation_match_log_loss;

  training_accuracy = accuracy(delta,1, T);
  training_match_log_loss = match_log_loss(sprints, delta, 1,T);
  evaluation_accuracy = accuracy(delta, T+1, M);
  evaluation_match_log_loss = match_log_loss(sprints, delta, T+1, M);
}

In Tokyo 2020 there’s a single female Japanese athlete, Yuka Kobayashi. The plot below shows the posterior ranges for the rider strengths, and shows the impact that the home advantage is expected to have (in blue).

Show code

bt4_draws <- read_remote_target("bt_draws_bt4")

japanese_riders <- olympic_riders %>% filter(Nationality == 'Japan') %>%
  mutate(variable = paste0('theta[', rider_id,']'))

bt4_draws <- bt4_draws %>%
  gather(variable, value, -starts_with("."))

bt4_rider_home_effect <- left_join(japanese_riders, bt4_draws) %>%
  select(-variable) %>%
  rename(rider_home_effect = value)

bt4_home_effect <- bt4_draws %>%
  filter(str_detect(variable, '^eta')) %>%
  rename(home_effect = value) %>%
  select(-variable) %>%
  left_join(bt4_rider_home_effect) %>%
  mutate(rider_home_effect = rider_home_effect + home_effect)

bt4_strengths <- bt4_draws %>%
  filter(str_detect(variable, 'alpha')) %>%
  mutate(rider_id = str_match(variable, '\\[(.*)\\]')[,2] %>% as.numeric()) %>%
  left_join(olympic_riders, .) %>%
  left_join(bt4_home_effect) %>%
  replace_na(list(rider_home_effect = 0))

bt4_strength_summ <- bt4_strengths %>%
  group_by(rider_name) %>%
  summarise(
    is_not_home = all(is.na(home_effect )),
    q5 = quantile(value, 0.05),
    median = quantile(value, 0.5),
    q95 = quantile(value, 0.95),
    ha_q5 = quantile(value + rider_home_effect, 0.05),
    ha_q95 = quantile(value + rider_home_effect, 0.95),
  ) %>%
  mutate(rider_name = fct_reorder(rider_name,median))

ggplot(bt4_strength_summ) +
  geom_segment(aes(x=q5,xend=q95,y=rider_name,yend=rider_name), color = infreq_palette["darkblue"]) +
  geom_segment(data = bt4_strength_summ %>% filter(is_not_home == FALSE), aes(x=ha_q5,xend=ha_q95,y=rider_name,yend=rider_name), color = infreq_palette["orange"], size = 1.1) +
  labs(y="",x="Strength, α")

Unfortunately for Yuka, it would appear that the home advantage is unlikely to be sufficient to help her secure a medal.

In hindsight home advantage is not likely to have a significant impact on the evaluation metrics as so few of the matches have a home athlete; this is confirmed below.

Accuracy

Data

Model	Split	5%	50% (Median)	95%
bt1	evaluation	0.648	0.716	0.773
bt2	evaluation	0.648	0.716	0.773
bt3	evaluation	0.648	0.716	0.773
bt4	evaluation	0.648	0.716	0.773
bt1	training	0.764	0.787	0.807
bt2	training	0.770	0.791	0.810
bt3	training	0.768	0.789	0.808
bt4	training	0.770	0.791	0.808

Match Log Loss

Data

Model	Split	5%	50% (Median)	95%
bt1	evaluation	-1.087	-0.919	-0.800
bt2	evaluation	-1.169	-0.951	-0.809
bt3	evaluation	-1.096	-0.928	-0.809
bt4	evaluation	-1.080	-0.916	-0.798
bt1	training	-0.642	-0.612	-0.586
bt2	training	-0.635	-0.603	-0.575
bt3	training	-0.612	-0.588	-0.567
bt4	training	-0.613	-0.588	-0.566

Time Varying Strengths

Up to now I’ve made the assumption that the athletes’ strengths do not vary over time. Given the data spans a period of multiple years this is a definite limitation in the model.

In my early attempts to introduce a time dependent component I considered a random walk dynamic, supposing that

\[ \alpha_r^{(m)} = \alpha_r^{(m-1)} + \text{Normal}(0, \sigma_{m}).\]

where the standard deviation $\sigma_m$ depends on the time since the last match; appealing to theory about random walks, I’d supposed this was proportionate to the square root of the time difference to the last match.

The problem with the model as defined above is that it requires that we determine some initial position, $\alpha_r^{(0)}$ from which successive strengths move away from, with uncertainty growing over time. This is appropriate if there is some fixed time point at which we know the riders’ strengths - but this is not the case in our situation.

To account for this a more sophisticated approach is required - and for that I’ll make use of Gaussian Processes (GPs). I won’t fully introduce GPs here, but they provide a framework for putting a distribution on functions, with the property that the value that the function takes at any finite subset of points are Normally distributed with some (known) covariance matrix.

I’ll fit a relatively simple GP, for each athlete’s strength at a given date.These will be determined by two parameters (per athlete) that determine how far the athlete’s strength is allowed to deviate from its average (the magnitude) and how fast it can change (the length scale).

I’ll admit now I haven’t had a chance to fully rationalise my position on priors for the Gaussian Process (which might go some way to explaining the model performance below!) - so won’t labour the explanation for the decision. Suffice it to say that I wanted to control the magnitude so that rider strengths wouldn’t fluctuate by more than $\pm 1$ over their career, and changes would happen on the scale of years, not months.

The plot below shows draws from the prior distribution for the GP on a single rider - giving a feel for the types of trends that the prior distribution is putting weight on. The GP I’ve chosen is stationary meaning that at any point the expected value is 0.

Show code

rider_days <- read_remote_target("rider_days") %>% mutate(date_index = 1:n())

alphaD_prior_draws <- read_remote_target("bt_draws_gp_prior")  %>%
      filter(.draw %% 800 == 0 ) %>%
      gather(variable, value, -starts_with(".")) %>%
      filter(str_detect(variable, 'f')) %>%
      mutate(date_index = str_match(variable, "\\[(.*)\\]")[,2] %>% as.numeric()) %>%
      left_join(rider_days) %>%
      left_join(riders) %>%
      filter(rider_name %in% c("LEE WAI SZE", "VOINOVA ANASTASIIA", "HINZE EMMA"))

ggplot(data = alphaD_prior_draws %>% filter(rider_name == 'LEE WAI SZE')) +
  geom_line( aes(x=date, y=value,color=rider_name, group = .draw), alpha = 0.2) +
  geom_line(aes(x = date, y = 0), size = 2 ) +
  ylab("Strength") + xlab("") + theme(legend.position = "none")

BT5 \[ \begin{align*} \bf{\text{Priors}} \\ \sigma & \sim \text{Gamma}(80,60) \\ \alpha_r & \sim \text{Normal}(0,\sigma) \\ \\ \eta & \sim \text{Gamma}(2,4) \\ \upsilon & \sim \text{Normal}(0, 0.2) \\ \theta_r & \sim \text{Normal}(0,\upsilon) \\ \\ \tau_0 & \sim \text{InvGamma}(11,1)\\ \rho_0 & \sim \text{InvGamma}(6,3)\\ \tau_r & \sim \text{Normal}(\tau_0, 0.05)\\ \rho_r & \sim \text{Normal}(\rho_0, 0.1)\\ f_r^{(m)} & \sim \text{GaussianProcess}(0, K^{SE}(\tau_r, \rho_r)) \\ \\ \bf{\text{Likelihood}}\\ \alpha_r^{(m)} & = \alpha_r + (\eta + \theta_r)\mathbf{\large 1}_r^{(m)} + f_r^{(m)}\\ \alpha_w^{(m)} - \alpha_l^{(m)} | S & \sim L_{\text{match}}(\alpha_r^{(m)} - \alpha_s^{(m)} | S) \end{align*} \]

Example Data

To fit this model two data sets were required. The match data now has additional columns indicating a date index for each rider

Show code

# read match data from remote _target store
matches <- read_remote_target("matches")

matches_sample <-  matches %>%
  filter(event == '2020 UCI TRACK WORLD CYCLING CHAMPIONSHIPS', round == 'Finals', gender == 'Women') %>%
  mutate(across(everything(), as.character)) %>%
  select(event, date, gender, round, rider_1,rider_2,rider_id_1,rider_id_2, winner_id,loser_id, sprints, winner_at_home, loser_at_home,
         winner_date_no, loser_date_no) %>%
  slice(1)

matches_sample <- tibble(Field = names(matches_sample), Example = unlist(matches_sample[1,]))
matches_sample %>%
  kable("pipe") %>%
  kable_styling(full_width=FALSE, bootstrap_options = c("striped", "hover", "condensed"),font_size = 10)

Field	Example
event	2020 UCI TRACK WORLD CYCLING CHAMPIONSHIPS
date	2020-02-28
gender	Women
round	Finals
rider_1	HINZE EMMA
rider_2	VOINOVA ANASTASIIA
rider_id_1	44
rider_id_2	123
winner_id	44
loser_id	123
sprints	2
winner_at_home	1
loser_at_home	0
winner_date_no	21
loser_date_no	23

These then link to a separate data set that provides information about the dates on which the athlete has raced

Show code

rider_days_sample <-  rider_days[100,] %>%
  mutate(across(everything(), as.character))

rider_days_sample <- tibble(Field = names(rider_days_sample), Example = unlist(rider_days_sample[1,]))
rider_days_sample %>%
  kable("pipe") %>%
  kable_styling(full_width=FALSE, bootstrap_options = c("striped", "hover", "condensed"),font_size = 10)

Field	Example
date	2021-08-02
rider_id	16
rider_date_no	1
days_to_prev	1082
days_to_start	1082
years_to_start	2.96235455167693
date_index	100

Stan code

Since the model requires in excess of 100 GPs to be fit, I’ve made use of the Hilbert space approximation to GPs analysed by Riutort-Mayol, et al.

A recent case study by Aki Vehtari makes for a lighter introduction.

functions {
  real match_logit_lpmf(int[] s, vector delta){
    real lpmf = 0;
    vector[num_elements(delta)] p = inv_logit(delta);
    
    for(n in 1:num_elements(s)){
      if (s[n] == 1) lpmf += log(p[n]);
      else if (s[n] == 2) lpmf += 2*log(p[n]);
      else lpmf += log(2) + 2*log(p[n]) + log(1-p[n]);
    }
    return lpmf;
  }
  
  real accuracy(vector delta, int start, int end){
    vector[end - start + 1] delta_seg = segment(delta, start, end - start + 1);
    real acc = 0;
    
    for(n in 1:num_elements(delta_seg)) acc += (delta_seg[n] > 0);
    
    return inv(num_elements(delta_seg)) * acc;
  }
  
  real log_loss(vector delta, int start, int end){
    vector[end - start + 1] delta_seg = segment(delta, start, end - start + 1); 
    real ll = 0;
    
    return inv(num_elements(delta_seg)) * bernoulli_logit_lpmf(1 | delta_seg);
  }
  
  real match_log_loss(int[] s, vector delta, int start, int end){
    int s_seg[end - start + 1]  = segment(s, start, end - start + 1);
    vector[end - start + 1] delta_seg = segment(delta, start, end - start + 1); 
    real mll = 0;
    
    return inv(num_elements(s_seg)) * match_logit_lpmf(s_seg | delta_seg);
  }
  
  // basis function approx. to GPs, developed by Riutort-Mayol et al. arxiv.org/2004.11408
  // implementation from
  // https://github.com/avehtari/casestudies/blob/master/Motorcycle/gpbasisfun_functions.stan
  vector diagSPD_exp_quad(real alpha, real rho, real L, int B) {
  return sqrt((alpha^1) * sqrt(2*pi()) * rho * exp(-0.5*(rho*pi()/2/L)^2 * linspaced_vector(B, 1, B)^2));
  }
  
  matrix PHI(int N, int B, real L, vector x) {
  return sin(diag_post_multiply(rep_matrix(pi()/(2*L) * (x+L), B), linspaced_vector(B, 1, B)))/sqrt(L);
  }
}

data {
  // Dimensions
  int<lower=0> R; // Riders
  int<lower=0> M; // Matches
  int<lower=0,upper = M> T; // Training matches
  int<lower=0> D; // Dates
  
  int<lower=1,upper=R> winner_id[M]; // ID's specifying riders in match
  int<lower=1,upper=R> loser_id[M];
  int<lower=1,upper=3> sprints[M];
    
  // Home advantage effects
  real<lower=0,upper=1> winner_at_home[M];
  real<lower=0,upper=1> loser_at_home[M];
  
  // Time series effects
  int<lower=0> winner_date_no[M];
  int<lower=0> loser_date_no[M];
  int<lower=0> date_index_R[R+1];
  vector[D] rider_dates;
  
  // basis function approx.
  int<lower=1> B; 
}

transformed data {
  vector[D] d_sc;  // rider dates centred/scaled
  real<lower=0> L; // GP basis function window length
  matrix[D,B] PHI_f; // GP basis function matrix
  
  {
    real mu_d = mean(rider_dates);
    real sd_d = sd(rider_dates);
    d_sc = (rider_dates-mu_d)/sd_d;
    L = 1.5 * max(d_sc); // set based on https://avehtari.github.io/casestudies/Motorcycle/motorcycle.html
  }
  
  PHI_f = PHI(D, B, L, d_sc);
}

parameters {
  // Rider average strength
  real alpha0[R];
  real<lower=0>sigma;
    
  // home advantage effect
  real<lower=0>eta;
  real<lower=0>upsilon;
  real theta[R];
  
  // GP parameters and hyperparameters
    real<lower=0> rho_pr;   // lengthscale hyperparameter
    real<lower=0> tau_pr;   // magnitude hyperparameter
    vector<lower=0>[R] rho; // lengthscale
    vector<lower=0>[R] tau; // magnitude
    vector[R*B] zeta;       // latent variables
}

transformed parameters{
    vector[D] f;       // approximate GP
    vector[D] alphaD;  // approx. GP with centered on average strength
  vector[M] delta;   // match differences
  real eta_theta[R];
  
  for(r in 1:R){
    
    eta_theta[r] = eta + theta[r];
    
    vector[B] diagSPD_f_r = diagSPD_exp_quad(tau[r], rho[r], L, B);
    
    // basis function approx. to Gaussian Process
    f[date_index_R[r]:(date_index_R[r+1]-1)] =
    PHI_f[date_index_R[r]:(date_index_R[r+1]-1),] * (diagSPD_f_r .* segment(zeta, 1 + (r-1)*B, B));

    // time dependent rider strength
    alphaD[date_index_R[r]:(date_index_R[r+1]-1)] =  alpha0[r] + f[date_index_R[r]:(date_index_R[r+1]-1)];
  }
      
    for(m in 1:M){
      delta[m] = 
       alphaD[date_index_R[winner_id[m]] + winner_date_no[m]] - alphaD[date_index_R[loser_id[m]] + loser_date_no[m]] +
      (winner_at_home[m] * eta_theta[winner_id[m]]) - (loser_at_home[m] * eta_theta[loser_id[m]]);
    }
}

model{
  // (hierarchical) priors - strength
  sigma ~ gamma(80,60);
  alpha0 ~ normal(0,sigma); 
  
  // (hierarchical) priors - home effect
  eta ~ gamma(2,4);
  upsilon ~ normal(0,0.2);
  theta ~ normal(0, upsilon);

  // Gaussian process lengthscale/magnitude
  rho_pr ~ inv_gamma(6,3);
  tau_pr ~ inv_gamma(11,1);
    rho ~ normal(rho_pr, 0.1);      // lengthscale GP
    tau ~ normal(tau_pr, 0.05);     // magnitude GP
    zeta ~ normal(0,1);       // latent variables for approx. Gaussian Process
    
  // likelihood
  head(sprints, T) ~ match_logit(head(delta, T));
}

generated quantities {
  // maximum theoretical strength difference between two riders
  real<lower=0> delta_max = max(alpha0) - min(alpha0);
  
  // Calculate log-loss, match log-loss and accuracy. A separate value is returned
  // for training/evaluation data, and within each of these the metrics are further
  // broken down by round (5 rounds in total) and the total (6th entry in vectors).
  real training_accuracy;
  real evaluation_accuracy;
  real training_match_log_loss;
  real evaluation_match_log_loss;

  training_accuracy = accuracy(delta,1, T);
  training_match_log_loss = match_log_loss(sprints, delta, 1,T);
  evaluation_accuracy = accuracy(delta, T+1, M);
  evaluation_match_log_loss = match_log_loss(sprints, delta, T+1, M);
}

The posterior plots below shows how the GP fits pick up changes in the riders strenghts over time - and suggest that Lee Wai Sze’s form may be starting to decline.

Show code

rider_days <- read_remote_target("rider_days") %>% mutate(date_index = 1:n())

    alphaD_draws <- read_remote_target("bt_draws_bt5")  %>%
      filter(.draw %% 100 == 0) %>%
      gather(variable, value, -starts_with(".")) %>%
      filter(str_detect(variable, 'alphaD')) %>%
      mutate(date_index = str_match(variable, "\\[(.*)\\]")[,2] %>% as.numeric()) %>%
      left_join(rider_days) %>%
      left_join(riders) %>%
      filter(rider_name %in% c("LEE WAI SZE", "VOINOVA ANASTASIIA", "HINZE EMMA"))

ggplot() +
  geom_line(data = alphaD_draws %>% filter(rider_name == 'LEE WAI SZE'), aes(x=date, y=value,color=rider_name, group = .draw), alpha = 0.1) +
  geom_smooth(data = alphaD_draws %>% filter(rider_name == 'LEE WAI SZE'), aes(x=date, y=value,color=rider_name), se = FALSE, size = 2) +
  
  geom_line(data = alphaD_draws %>% filter(rider_name == 'HINZE EMMA'), aes(x=date, y=value,color=rider_name, group = .draw), alpha = 0.1) +
  geom_smooth(data = alphaD_draws %>% filter(rider_name == 'HINZE EMMA'), aes(x=date, y=value,color=rider_name), se = FALSE, size = 2) +
  
  geom_line(data = alphaD_draws %>% filter(rider_name == "VOINOVA ANASTASIIA"), aes(x=date, y=value,color=rider_name, group = .draw), alpha = 0.1) +
  geom_smooth(data = alphaD_draws %>% filter(rider_name == "VOINOVA ANASTASIIA"), aes(x=date, y=value,color=rider_name), se = FALSE, size = 2) +
  theme(legend.title = element_blank()) +
  ylab("Strength") + xlab("")

The model summary below indicates that introducing the GP appears to lower the accuracy on the evaluation data: although in reality this is the difference of just a single match.

Accuracy

Data

Model	Split	5%	50% (Median)	95%
bt1	evaluation	0.648	0.716	0.773
bt2	evaluation	0.648	0.716	0.773
bt3	evaluation	0.648	0.716	0.773
bt4	evaluation	0.648	0.716	0.773
bt5	evaluation	0.636	0.705	0.761
bt1	training	0.764	0.787	0.807
bt2	training	0.770	0.791	0.810
bt3	training	0.768	0.789	0.808
bt4	training	0.770	0.791	0.808
bt5	training	0.780	0.803	0.824

Match Log Loss

Data

Show code

log_loss <- measures %>%
  # totals are given in round_id = 6
  filter(measure == "match_log_loss")

log_loss %>% 
  select(model, split,  q5, median, q95) %>%
  arrange(split, model) %>%
  kable("pipe", col.names =c("Model", "Split", "5%", "50% (Median)", "95%"), digits =3)

Model	Split	5%	50% (Median)	95%
bt1	evaluation	-1.087	-0.919	-0.800
bt2	evaluation	-1.169	-0.951	-0.809
bt3	evaluation	-1.096	-0.928	-0.809
bt4	evaluation	-1.080	-0.916	-0.798
bt5	evaluation	-1.130	-0.942	-0.808
bt1	training	-0.642	-0.612	-0.586
bt2	training	-0.635	-0.603	-0.575
bt3	training	-0.612	-0.588	-0.567
bt4	training	-0.613	-0.588	-0.566
bt5	training	-0.589	-0.559	-0.532

Perhaps against better judgement I’m going to retain the GP contribution in the model going forward. This might be the result of bias - a reluctance to throw away my hard work - but I’d like to think I have a few more rational justifications:

The difference in accuracy is negligible, and given the small sample size of the evaluation data I don’t consider this to be particularly reliable.
The worsening in the log loss is due to some of the more extreme strengths assigned to Lee Wai Sze at her peak. The fit suggests she is now on a decline, and if that is the case we want the model to account for this as her data will be the single biggest driver of the odds.

The next, and final, model adaptation will reduce the contribution of the Gaussian Process.

I’ll reflect on this decision in the final post, once predictions are made and the actual results are known.

Qualifying Times

The final revision I’ll make to the model is to take into account one final dynamic of the event format - the qualifying round.

Prior to any individual matches taking place, each athlete completes a sprint on their own (usually a 200m effort, with a flying start). The times for this sprint decide the seeding of the matches: with the fastest rider facing the slowest qualifying rider, and so on.

Assuming that athletes complete the qualifying round with the aim of being the fastest this will be a good predictor of the pure speed they currently have. The combination of this current form, and their tactical skill (measurable from our historic match data) should combine to improve the model.

For a given match $m$, I’ll denote $q_{r}^{(m)}$ for the time the athlete posted in the qualifying round for the event, and will update the rider’s strength for the match to have an additional factor $-\kappa q_{r}^{(m)}$. The parameter $\kappa$ will be fixed across all riders and events.

As with a number of the other parameters it makes sense to assume this can vary: since speeds in velodromes are highly dependent on the the conditions (air pressure, temperature, altitude) so it makes sense to allow $\kappa$ to vary between events. I will therefore consider separate coefficients for each event:

\[ \kappa^{(m)} = \kappa + \phi_{E_m}\] where $m$ is a match taking place during event $E_m$.

As usual we need to choose a prior for $\kappa$ - I’ll do a back of the envelope calculation to set an appropriate scale. First off, qualifying time differences are usually between 0 and 0.5 seconds: so lets take the common scenario of a time difference of 0.2 seconds.

Suppose then that this time difference is observed, but otherwise two riders are believed to have the same strength. In that scenario I’d back the faster qualifier to win with a probability between 0.55 and 0.8. Converting back, that implies values of $\kappa$ between 1 and 7.

I don’t want to rule out the scenario that perhaps qualifying faster is detrimental - so overall I’ll pick a fairly broad prior of $\kappa \sim \text{Normal}(4,4)$ which allows for that scenario, whilst favouring my estimated range above.

For the hierarchical event level term I’ll suppose $\phi_{E_m} \sim \text{Normal}(0,\psi)$ and put a prior on the unknown standard deviation $\psi \sim \text{HalfNormal}(0, \frac12)$.

BT6 \[ \begin{align*} \bf{\text{Priors}} \\ \sigma & \sim \text{Gamma}(80,60) \\ \alpha_r & \sim \text{Normal}(0,\sigma) \\ \\ \eta & \sim \text{Gamma}(2,4) \\ \upsilon & \sim \text{Normal}(0, 0.2) \\ \theta_r & \sim \text{Normal}(0,\upsilon) \\ \\ \tau_0 & \sim \text{InvGamma}(11,1)\\ \rho_0 & \sim \text{InvGamma}(6,3)\\ \tau_r & \sim \text{Normal}(\tau_0, 0.05)\\ \rho_r & \sim \text{Normal}(\rho_0, 0.1)\\ f_r^{(m)} & \sim \text{GaussianProcess}(0, K^{SE}(\tau_r, \rho_r)) \\ \\ \kappa & \sim \text{Normal}(4,4)\\ \psi & \sim \text{HalfNormal}(0,\frac12)\\ \phi_{E_m} & \sim \text{Normal}(0,\psi) \\ \bf{\text{Likelihood}}\\ \alpha_r^{(m)} & = \alpha_r + (\eta + \theta_r)\mathbf{\large 1}_r^{(m)} + f_r^{(m)} - (\kappa + \phi_{E_m}) q_r^{(m)}\\ \alpha_w^{(m)} - \alpha_l^{(m)} | S & \sim L_{\text{match}}(\alpha_r^{(m)} - \alpha_s^{(m)} | S) \end{align*} \]

Example Data

As in model BT5, we need two data sets. In addition to the data introduced there, we now have the further winner_qual_time_diff column with the qualifying time differences between the two riders.

Show code

# read match data from remote _target store
matches <- read_remote_target("matches")

matches_sample <-  matches %>%
  filter(event == '2020 UCI TRACK WORLD CYCLING CHAMPIONSHIPS', round == 'Finals', gender == 'Women') %>%
  mutate(across(everything(), as.character)) %>%
  select(event, date, gender, round, rider_1,rider_2,rider_id_1,rider_id_2, winner_id,loser_id, sprints, winner_at_home, loser_at_home,
         winner_date_no, loser_date_no, winner_qual_time_diff) %>%
  slice(1)

matches_sample <- tibble(Field = names(matches_sample), Example = unlist(matches_sample[1,]))
matches_sample %>%
  kable("pipe") %>%
  kable_styling(full_width=FALSE, bootstrap_options = c("striped", "hover", "condensed"),font_size = 10)

Field	Example
event	2020 UCI TRACK WORLD CYCLING CHAMPIONSHIPS
date	2020-02-28
gender	Women
round	Finals
rider_1	HINZE EMMA
rider_2	VOINOVA ANASTASIIA
rider_id_1	44
rider_id_2	123
winner_id	44
loser_id	123
sprints	2
winner_at_home	1
loser_at_home	0
winner_date_no	21
loser_date_no	23
winner_qual_time_diff	-0.257999999999999

These then link to a separate data set that provides information about the dates on which the athlete has raced

Show code

rider_days <- read_remote_target("rider_days")

rider_days_sample <-  rider_days[100,] %>%
  mutate(across(everything(), as.character))

rider_days_sample <- tibble(Field = names(rider_days_sample), Example = unlist(rider_days_sample[1,]))
rider_days_sample %>%
  kable("pipe") %>%
  kable_styling(full_width=FALSE, bootstrap_options = c("striped", "hover", "condensed"),font_size = 10)

Field	Example
date	2021-08-02
rider_id	16
rider_date_no	1
days_to_prev	1082
days_to_start	1082
years_to_start	2.96235455167693

Stan code

functions {
  real match_logit_lpmf(int[] s, vector delta){
    real lpmf = 0;
    vector[num_elements(delta)] p = inv_logit(delta);
    
    for(n in 1:num_elements(s)){
      if (s[n] == 1) lpmf += log(p[n]);
      else if (s[n] == 2) lpmf += 2*log(p[n]);
      else lpmf += log(2) + 2*log(p[n]) + log(1-p[n]);
    }
    return lpmf;
  }
  
  real accuracy(vector delta, int start, int end){
    vector[end - start + 1] delta_seg = segment(delta, start, end - start + 1);
    real acc = 0;
    
    for(n in 1:num_elements(delta_seg)) acc += (delta_seg[n] > 0);
    
    return inv(num_elements(delta_seg)) * acc;
  }
  
  real log_loss(vector delta, int start, int end){
    vector[end - start + 1] delta_seg = segment(delta, start, end - start + 1); 
    real ll = 0;
    
    return inv(num_elements(delta_seg)) * bernoulli_logit_lpmf(1 | delta_seg);
  }
  
  real match_log_loss(int[] s, vector delta, int start, int end){
    int s_seg[end - start + 1]  = segment(s, start, end - start + 1);
    vector[end - start + 1] delta_seg = segment(delta, start, end - start + 1); 
    real mll = 0;
    
    return inv(num_elements(s_seg)) * match_logit_lpmf(s_seg | delta_seg);
  }
  
  // basis function approx. to GPs, developed by Riutort-Mayol et al. arxiv.org/2004.11408
  // implementation from
  // https://github.com/avehtari/casestudies/blob/master/Motorcycle/gpbasisfun_functions.stan
  vector diagSPD_exp_quad(real alpha, real rho, real L, int B) {
  return sqrt((alpha^1) * sqrt(2*pi()) * rho * exp(-0.5*(rho*pi()/2/L)^2 * linspaced_vector(B, 1, B)^2));
  }
  
  matrix PHI(int N, int B, real L, vector x) {
  return sin(diag_post_multiply(rep_matrix(pi()/(2*L) * (x+L), B), linspaced_vector(B, 1, B)))/sqrt(L);
  }
}

data {
  // Dimensions
  int<lower=0> R; // Riders
  int<lower=0> M; // Matches
  int<lower=0,upper = M> T; // Training matches
  int<lower=0> D; // Dates
  int<lower=0> E; // Events
  
  int<lower=1,upper=R> winner_id[M]; // ID's specifying riders in match
  int<lower=1,upper=R> loser_id[M];
  int<lower=1,upper=3> sprints[M];
    
  // Home advantage effects
  real<lower=0,upper=1> winner_at_home[M];
  real<lower=0,upper=1> loser_at_home[M];
  
  // Time series effects
  int<lower=0> winner_date_no[M];
  int<lower=0> loser_date_no[M];
  int<lower=0> date_index_R[R+1];
  vector[D] rider_dates;
  
  // basis function approx.
  int<lower=1> B;
  
  // qualifying time difference;
  real qual_diff[M];
  int event[M];
}

transformed data {
  vector[D] d_sc;  // rider dates centred/scaled
  real<lower=0> L; // GP basis function window length
  matrix[D,B] PHI_f; // GP basis function matrix
  
  {
    real mu_d = mean(rider_dates);
    real sd_d = sd(rider_dates);
    d_sc = (rider_dates-mu_d)/sd_d;
    L = 1.5 * max(d_sc); // set based on https://avehtari.github.io/casestudies/Motorcycle/motorcycle.html
  }
  
 PHI_f = PHI(D, B, L, d_sc);
}

parameters {
  // Rider average strength
  real alpha0[R];
  real<lower=0>sigma;
    
  // home advantage effect
  real<lower=0>eta;
  real<lower=0>upsilon;
  real theta[R];
  
  // GP parameters and hyperparameters
    real<lower=0> rho_pr;   // lengthscale hyperparameter
    real<lower=0> tau_pr;   // magnitude hyperparameter
    vector<lower=0>[R] rho; // lengthscale
    vector<lower=0>[R] tau; // magnitude
    vector[R*B] zeta;       // latent variables
    
    // qualifying time differences;
    real kappa;
    real<lower=0> psi;
    real phi[E];
}

transformed parameters{
    vector[D] f;       // approximate GP
    vector[D] alphaD;  // approx. GP with centered on average strength
  vector[M] delta;   // match differences
  real eta_theta[R];
  
  for(r in 1:R){
    
    eta_theta[r] = eta + theta[r];
    
    vector[B] diagSPD_f_r = diagSPD_exp_quad(tau[r], rho[r], L, B);
    
    // basis function approx. to Gaussian Process
    f[date_index_R[r]:(date_index_R[r+1]-1)] =
    PHI_f[date_index_R[r]:(date_index_R[r+1]-1),] * (diagSPD_f_r .* segment(zeta, 1 + (r-1)*B, B));

    // time dependent rider strength
    alphaD[date_index_R[r]:(date_index_R[r+1]-1)] =  alpha0[r] + f[date_index_R[r]:(date_index_R[r+1]-1)];
  }
      
    for(m in 1:M){
      delta[m] = alphaD[date_index_R[winner_id[m]] + winner_date_no[m]] - alphaD[date_index_R[loser_id[m]] + loser_date_no[m]] +
      (winner_at_home[m] * eta_theta[winner_id[m]]) - (loser_at_home[m] * eta_theta[loser_id[m]]) -
      (kappa + phi[event[m]]) * qual_diff[m];
    }
}

model{
  // (hierarchical) priors - strength
  sigma ~ gamma(80,60);
  alpha0 ~ normal(0,sigma); 
  
  // (hierarchical) priors - home effect
  eta ~ gamma(2,4);
  upsilon ~ normal(0,0.2);
  theta ~ normal(0, upsilon);

  // Gaussian process lengthscale/magnitude
  rho_pr ~ inv_gamma(6,3);
  tau_pr ~ inv_gamma(11,1);
    rho ~ normal(rho_pr, 0.1);      // lengthscale GP
    tau ~ normal(tau_pr, 0.05);     // magnitude GP
    zeta ~ normal(0,1);       // latent variables for approx. Gaussian Process

    // qualifying time difference;
    kappa ~ normal(4,4);
    psi ~ normal(0,0.5);
    phi ~ normal(0, psi);
    

  head(sprints, T) ~ match_logit(head(delta, T));
}

generated quantities {
  // maximum theoretical strength difference between two riders
  real<lower=0> delta_max = max(alpha0) - min(alpha0);
  
  // Calculate log-loss, match log-loss and accuracy. A separate value is returned
  // for training/evaluation data, and within each of these the metrics are further
  // broken down by round (5 rounds in total) and the total (6th entry in vectors).
  real training_accuracy;
  real evaluation_accuracy;
  real training_match_log_loss;
  real evaluation_match_log_loss;

  training_accuracy = accuracy(delta,1, T);
  training_match_log_loss = match_log_loss(sprints, delta, 1,T);
  evaluation_accuracy = accuracy(delta, T+1, M);
  evaluation_match_log_loss = match_log_loss(sprints, delta, T+1, M);
}

Including the qualifying data turns out to be the magic bullet to boost the model’s performance, substantially lifting both the accuracy and match log loss.

Accuracy

Data

Model	Split	5%	50% (Median)	95%
bt1	evaluation	0.648	0.716	0.773
bt2	evaluation	0.648	0.716	0.773
bt3	evaluation	0.648	0.716	0.773
bt4	evaluation	0.648	0.716	0.773
bt5	evaluation	0.636	0.705	0.761
bt6	evaluation	0.761	0.818	0.852
bt1	training	0.764	0.787	0.807
bt2	training	0.770	0.791	0.810
bt3	training	0.768	0.789	0.808
bt4	training	0.770	0.791	0.808
bt5	training	0.780	0.803	0.824
bt6	training	0.843	0.860	0.877

Match Log Loss

Data

Show code

log_loss <- measures %>%
  # totals are given in round_id = 6
  filter(measure == "match_log_loss")

log_loss %>% 
  select(model, split,  q5, median, q95) %>%
  arrange(split, model) %>%
  kable("pipe", col.names =c("Model", "Split", "5%", "50% (Median)", "95%"), digits =3)

Model	Split	5%	50% (Median)	95%
bt1	evaluation	-1.087	-0.919	-0.800
bt2	evaluation	-1.169	-0.951	-0.809
bt3	evaluation	-1.096	-0.928	-0.809
bt4	evaluation	-1.080	-0.916	-0.798
bt5	evaluation	-1.130	-0.942	-0.808
bt6	evaluation	-0.909	-0.751	-0.625
bt1	training	-0.642	-0.612	-0.586
bt2	training	-0.635	-0.603	-0.575
bt3	training	-0.612	-0.588	-0.567
bt4	training	-0.613	-0.588	-0.566
bt5	training	-0.589	-0.559	-0.532
bt6	training	-0.445	-0.423	-0.404

Recall that in the first post I defined the prior on the main athlete strengths, $\alpha_r$, with the belief that the odds that the weakest athlete can beat the strongest would be between 1-100 and 1-1,000.

In all our models this prior stopped the model from erring towards really extreme values for these odds - but never brought them down to the scale I believe plausible. The table below summarises the posterior odds under the two latest models, showing that including qualifying data finally makes these extremal odds look plausible.

Show code

bt5_max_odds <- bt5_summ %>%
  filter(str_detect(variable, 'delta_max')) %>%
  select(q5, median, q95) %>%
  mutate(
    # calculation is based on the fact that the strength difference is the log odds
    # on the exponential scale
    across(everything(), ~format(round(10^(./log(10)),-2),big.mark = ","))
  )

bt6_max_odds <- bt6_summ %>%
  filter(str_detect(variable, 'delta_max')) %>%
  select(q5, median, q95) %>%
  mutate(
    # calculation is based on the fact that the strength difference is the log odds
    # on the exponential scale
    across(everything(), ~format(round(10^(./log(10)),-2),big.mark = ","))
  )

bind_rows(
  bt5_max_odds %>% add_column(model = 'bt5', .before = 0),
  bt6_max_odds %>% add_column(model = 'bt6', .before = 0)
) %>%
  arrange(model) %>%
  kable(col.names = c("", "5%", "50%", "95%"), digits = 2) %>%
  kable_styling(full_width=FALSE)

	5%	50%	95%
bt5	1,100	4,600	30,200
bt6	100	500	2,500

One limitation of this final model is that we won’t be able to use its fully predictive power until the qualifying rounds at Tokyo 2020 have taken place. I’ll be looking to make some bets before and after qualifying, so will need to make use of the time dependent strength values from this model fit in the pre-qualifying bets.

The Final Fit and Reflections

Before wrapping up I’ll fit the model one more time, this time using all of the data.

The plot below provides the rider strengths as forecast for the 2nd August 2021 - the week over which the Individual Sprint medals will be contested. Importantly this model not only puts the rider strengths on a tighter scale, but also changes some of the athlete orders.

Show code

rider_days <- rider_days %>% mutate(date_id = 1:n())

bt_final_draws <- read_remote_target("bt_draws_bt_final")

bt_final_draws <- bt_final_draws %>%
  gather(variable, value, -starts_with("."))

bt_final_rider_home_effect <- left_join(japanese_riders, bt_final_draws) %>%
  select(-variable) %>%
  rename(rider_home_effect = value)

bt_final_home_effect <- bt_final_draws %>%
  filter(str_detect(variable, '^eta')) %>%
  rename(home_effect = value) %>%
  select(-variable) %>%
  left_join(bt_final_rider_home_effect) %>%
  mutate(rider_home_effect = rider_home_effect + home_effect)

bt_final_strengths <- bt_final_draws %>%
  filter(str_detect(variable, 'alphaD')) %>%
  mutate(date_id = str_match(variable, '\\[(.*)\\]')[,2] %>% as.numeric()) %>%
  left_join(rider_days) %>%
  left_join(olympic_riders, .) %>%
  filter(date == max(date)) %>%
  left_join(bt_final_home_effect) %>%
  replace_na(list(rider_home_effect = 0))

bt_final_strength_summ <- bt_final_strengths %>%
  group_by(rider_name) %>%
  summarise(
    is_not_home = all(is.na(home_effect )),
    q5 = quantile(value + rider_home_effect, 0.05),
    median = quantile(value+ rider_home_effect,  0.5),
    q95 = quantile(value+ rider_home_effect, 0.95)
  ) %>%
  mutate(rider_name = fct_reorder(rider_name,median))

ggplot(bt_final_strength_summ, ) +
  geom_segment(aes(x=q5,xend=q95,y=rider_name,yend=rider_name, color = is_not_home)) +
  labs(y="",x="Strength, α") +
  theme(legend.position = "none")

On reflection the first model development should have been to include the qualifying data - though this is easier to say in hindsight having seen the performance metrics.

Having said this - there is a give away that I got my prioritisation wrong that’s visible in my priors. For the qualifying time difference I felt an appropriate prior would swing an otherwise equal match to give the a rider who qualified 0.2 seconds faster a probability of winning between 0.55 - 0.8; for larger, and still common, differences of 0.4 seconds that grows further. On the other hand I felt that the impact of the home advantage was likely between 0.55 - 0.75.

That’s to say that my priors implicitly show my expectation was that qualifying difference would have a larger effect size than home advantage - and yet I started by coding up the effect I believed to be less important.

To be fair there are mitigating reasons - the home advantage is a text-book adaptation of the model whilst the qualifying difference is bespoke to this application - and required further data preparation.

The prior on the GP model was on a similar scale - and it felt important to let the model pick out trends. However - this is implicitly handled in the qualifying time data (as qualifying time is a predictor of form), and further the GP work required a lot of tweaking for little gain (perhaps some detriment even).

Finally the very early detour to move to the match likelihood was something that principally was the correct change - it does better model the underlying race dynamic - but the key here is how little the change actually has to the model, as the bulk of matches are still single sprints.

So the key take away I’ll pull from this is to prioritise my work flow in the order of my prior distributions on effect sizes. Even if the priors turn out to be way off, its a better justification than a try-it-and-see attitude.

Moving on - In the next post I’ll take the model as it now stands and turn this into betting odds, ahead of events kicking off in the velodrome.

Comments

Acknowledgments

This post makes heavy use of R and Stan, and so benefits from the work of all those who contribute to their development.

Scenario	Sprints Contested	Prob. Occurrence
Wins first two sprints outright	2	\(p^2\)
Wins first and third, loses second	3	\(p(1-p)p\)
Loses first, wins second and third	3	\((1-p)p^2\)
Total		\(p^2 + 2p^2(1-p)\)