When Karl Broman tweeted about his laundry he likely didn’t imagine that people would still be estimating how many socks he washed 7 years later. In this post my willingness to derive some exact formulae will enable a fully Bayesian, sampling free, approach to laundry quantification.
That the 1st 11 socks in the laundry are each distinct suggests there are a lot more socks. pic.twitter.com/EGSo9P6rw7
— Karl Broman (@kwbroman) October 17, 2014
I belatedly found my way to the puzzle of estimating exactly how many socks Karl Broman washed through Rasmus Bååth’s excellent blog post, which uses the problem to illustrate Approximate Bayesian Computation.
Rasmus wraps-up the post by presenting three potential criticisms of his analysis, of which one is Why use approximate Bayesian computation at all? I’ll take up his challenge and derive an explicit formula for the likelihood function, enabling a Bayesian analysis without the need for sampling methods.
I’ll also propose an alternative model to take into account my personal belief that the Tweet was sent in the knowledge that the twelfth sock was going to break the run of distinct socks!
This post makes some assumptions about you: that you have some experience with (or a willingness to learn) statistics and discrete probability.
I’ll assume that you’re familiar twith the concepts of Bayesian statistics: a model (the likelihood) describes our understanding of how some data is generated, it can be combined with initial assumptions about plausible parameter values (prior distributions) and combining these with observed data leads to refined assumptions (posterior distributions).
To follow the derivation of the likelihood you’ll need to know some common constructs from discrete probability/combinatorics: e.g. understanding of binomial coefficients, and how these relate to counting problems.
Our aim is to estimate the total number of socks that Karl Broman washed given his Tweet that the first 11 removed from the washing machine were distinct.
I’ll use the following notation throughout:
\[ \begin{align*} \bf{\text{Data}}\\ d & = \text{No. successive distinct socks observed before Tweeting}\\ \\ \bf{\text{Parameters}}\\ \rho & = \text{No. pairs of socks in the wash}\\ \sigma & = \text{No. singleton socks in the wash} \\ \end{align*} \]
In the case of the Tweet, \(d = 11\). Using two parameters \(\rho\) and \(\sigma\) allows us to handle the scenario that whilst most socks come in pairs, in some households stray singleton socks are not uncommon. I will however assume socks don’t come in multiples of more than two, rulling out the not-uncommon scenario in which two pairs of identical socks are washed.
The likelihood, \(L(d|\rho,\sigma)\), describes how the unknown parameters generate the obsered data: Assuming there were \(\rho\) pairs of socks and \(\sigma\) singletons how likely is it that the first \(d\) socks are all distinct?
As a warm up for deriving a complete formula, I’ll consider some of the edge cases which have logical heuristics:
Scenario | Heuristic | Likelihood \(L(d|\rho,\sigma)\) |
---|---|---|
\(\rho, \, \sigma < 0\) | Let’s not be silly: you can’t have negative socks. | 0 |
\(d > 2\rho + \sigma\) | We can’t observe more socks than were washed. | 0 |
\(d > \rho + \sigma\) | We can’t observe more distinct socks than the number of distinct socks that were washed. | 0 |
\(\rho = 0, \, \sigma \geq d\) | If all the socks were different, then of course all the observed socks are different. | 1 |
With the edge cases handled, I’ll press on and handle the substantive problem.
Likelihood \[ L(\rho,\sigma|d) = \binom{2\rho + \sigma}{d}^{-1} \sum_{j=0}^{\sigma} 2^{d-j} \binom{\sigma}{j} \binom{\rho}{d-j} \]
Examples \[ L(\rho,\sigma|d) = \binom{2\rho + \sigma}{d}^{-1} \sum_{j=0}^{\sigma} 2^{d-j} \binom{\sigma}{j} \binom{\rho}{d-j} \]
As a soft check that the formula above is correct, let’s take a look at some specific cases.
This is the smallest non-trivial scenario, and we can check this easily by hand. Denoting the socks by {S,P1,P2}, there are three ways to choose two of them: {S,P1}, {S,P2} and {P1,P2}. In two of the scenarios the socks are distinct, so the likelihood is 2/3.
Plugging the parameters/data into the formula above: \[ \begin{align} L(2,1|2) & = \binom{3}{2}^{-1} \left\{ 2^2 \binom{1}{0}\binom{1}{2} + 2^1 \binom{1}{1}\binom{1}{1}\right\} = 3^{-1}\left(0 + 2\right) = \frac23 \end{align} \]
Whilst this example doesn’t sound much more complex, crunching numbers directly would be pretty tedious (admittedly, tricky) as the denominator of the likelihood formula suggests there are 210 scenarios to check.
Evaluating the formula for these parameters indicates the likelihood is 129/210 ~ 0.614.
To validate this claim, I’ll simulate drawing 4 socks from 3 pairs and 4 singletons, and calculate the proportion of these simulations that return distinct socks.
set.seed(1414214)
rho <- 3
sigma <- 4
d <- 4
# vector of all the socks
all_socks <- c(rep(paste0("P",1:rho), 2), paste0("S", 1:sigma))
# a function to sample d socks without replacement from all_socks, and return
# 1 if all socks are distinct, and 0 otherwise
sample_socks <- function(all_socks, d){
sock_sample <- sample(x = all_socks, size = d, replace = FALSE)
return( 1 * (length(sock_sample) == length(unique(sock_sample))) )
}
# draw samples
draws <- tibble(draw = map_dbl(1:1e05, ~sample_socks(all_socks, d)))
Summary | |
---|---|
No. Draws | 100,000 |
No. All Distinct | 61,360 |
Prob. All Distinct | 0.614 |
Proof
\[ L(\rho,\sigma|d) = \binom{2\rho + \sigma}{d}^{-1} \sum_{j=0}^{\sigma} 2^{d-j} \binom{\sigma}{j} \binom{\rho}{d-j} \]
In words, the likelihood is given by the following fraction
\[ \frac{\text{No. ways to choose d }{\bf{distinct}}\text{ socks from $\rho$ pairs and $\sigma$ singletons.}}{\text{No. ways to choose d socks from $\rho$ pairs and $\sigma$ singletons.}} \]
Starting with the denominator, this is none other than the total number of ways to choose \(d\) objects without replacement from a total of \(2 \rho + \sigma\) (the factor of two is because \(\rho\) pairs of socks equates to \(2 \rho\) individual socks). That is given by the binomial coefficient \(\binom{2\rho + \sigma}{d}\), and explains the leading term in the likelihood formula above.
Turning to the numerator, I’ll break this down by conditioning on the number of singleton socks. counting:
\[\text{No. of ways to choose $d$ distinct socks, given that $j$ of them are singletons.}\]
Starting with the sigletons, there are \(\binom{\sigma}{j}\) ways to choose exactly \(j\) of these. The remaining \(d-j\) socks need to come from the pairs, there are \(\rho\) distinct socks that form \(2\rho\) pairs, so there are \(\binom{\rho}{d-j}\) ways to choose the type of socks. But then for each of these \(d-j\) socks we need an additional factor of 2 as we could have chosen between two (left, and right) socks. Bringing this together, we have:
\[2^{d-j}\binom{\rho}{d-j}\binom{\sigma}{j}.\]
The full formula for numerator, and then the likelihood, follows by summing over the possible values of \(j = 0,\ldots,\sigma\).
For a similar proof, I previously posted an answer to a question on Cross Validated with a slightly different sock related problem.
In practice when evaluating the likelihood (and later the posterior distribution) there are some computational tricks we employ to avoid running into problems of integer overflow; these are detailed in the end-notes.
In the plot below we visualise the likelihood for the case of interest, \(d = 11\).
# log likelihood for Baath's model
baath_log_likelihood <- function(rho, sigma, d){
# it is not possible to choose more than p+s distinct socks
if(d > rho + sigma) return(-Inf)
# log summation terms, for the log-sum-exp trick.
log_summation_terms <- purrr::map(0:min(d, sigma), function(j){
(d-j)*log(2) + lchoose(sigma,j) + lchoose(rho,d-j)
})
log_likelihood <- -lchoose(2*rho + sigma,d) + logSumExp(log_summation_terms)
return(log_likelihood)
}
# compute grid of likelihood values, fix d = 11
# calculate_sock_grid defined in the Utility Functions appendix
baath_likelihood_grid <-
calculate_sock_grid(
rho_max = 30, sigma_max = 20,
log_likelihood = function(rho,sigma){baath_log_likelihood(rho,sigma,d=11)}
)
# plot likelihood grid
# plot_sock_grid defined in the Utility Functions appendix
plot_sock_grid(baath_likelihood_grid, var = likelihood)
The plot demonstrates why a frequentist, maximum likelihood estimate (MLE) approach to solving this problem is guaranteed to give unsatisfactory results: insisting that the most likely scenario is that all of the socks in the wash were unique, and being non-committal about how many there were in total (the likelihood is maximised simultaneously for all scenarios where \(\sigma \geq 11\), and \(\rho = 0\)).
Even introducing an assumption that there’s at least one pair is unsatisfactory: in this case the maximum is never achieved as the likelihood increases as \(\rho,\sigma \rightarrow \infty\).
To avoid getting stuck in trivial edge scenarios, I’ll introduce a prior distribution over \((\rho,\sigma)\), and conduct a Bayesian Analysis. The prior captures our beliefs about the number of socks in the washing machine, in absence of any data.
I’ll put separate priors on the number of singleton and pairs of socks, and then add a restriction that the total socks can’t exceed a fixed (large) amount. This differs from Rasmus’ approach, but produces formulae that are easier to work with: since Rasmus was using sampling, complexity in the formulae wasn’t an issue.
Capping the maximum number of socks has a natural interpretation as we know there are limits on how many socks could fit in a domestic washing machine. More importantly the cap is essential for computing the normalising constant that crops up when using Bayes rule. To dig into that in more detail, I’ll denote \(p(\rho,\sigma)\) for the prior distribution, \(p(\rho, \sigma) |d)\) for the posterior distribution, and \(L(d|\rho,\sigma)\) the likelihdood as before, then Bayes rule gives
\[p(\rho,\sigma | d) = Z^{-1}L(d | \rho, \sigma) p(\rho,\sigma), \qquad \text{where } Z = \sum_{\rho,\sigma} L(d | \rho, \sigma) p(\rho,\sigma).\]
For arbitrary choices of priors, we cannot expect to be able to derive a formula for \(Z\) directly - and thefore we need to evaluate it computationally. By restricting \(\rho, \,\sigma\) to a finite range we can calculate \(Z\) without having to make estimates of the tail behaviour.
The prior distribution on \(\rho\) and \(\sigma\) should be a discrete distribution, on the positive integers. Like Rasmus I’ll use Negative Binomial distributions, which are a flexible generalisation of the Poisson distribution, allowing us to separately control the mean and variance.
I’ll choose parameters for the Negative Binomial distributions so that the marginal distributions of the product prior closely match those of Rasmus’ prior, which he derived based on assumptions about the plausible amount of washing a family might produce. This leads to me using the priors:
\[\rho \sim \text{NegBinom}\left(3 \frac{1}{4}, \frac{1}{5}\right), \qquad \sigma \sim \text{NegBinom}\left(2, \frac{1}{3}\right),\]
I’ll cap the maximum number of socks that could plausibly have been washed at 300; to justify this: Googling the average weight of a sock returns a range of estimates, but low estimates said a sock weighs 50g. 300 socks would therefore come in at 15kg, which is the maximum drum size of any domestic washing machine I could find on Amazon!
Writing out the prior explicitly gives:
\[p(\rho,\sigma) = \textstyle \mathbf{\large 1}_{2\rho + \sigma \leq 300} \, \times \, \binom{\rho + 2\frac{1}{4}}{\rho} \left(1 - \frac{1}{5}\right)^{3 \frac{1}{4}} \left(\frac{1}{5}\right)^{\rho} \,\,\times\,\, \binom{\sigma + 1}{\sigma} \left(1 - \frac{1}{3}\right)^{2} \left(\frac{1}{3}\right)^{\sigma}\]
The plots/table below compare samples from the product prior and the prior Rasmus uses.
Marginals