This is a home for various mathematical/statistical modelling projects I’ve done for my own pleasure and learning, with the occasional detour into exploratory data analysis, and programming. Common themes will become aparant, but be prepared for analysis of cycling, justice, and bats.
There are plenty of blogs out there explaining how to data wrangle in a given programming language, or implement specific models. I’m indebted to people for creating that content, but my aim is somewhat different. I’m aiming to create case studies for analytical projects: explaining my workflow, why I’ve chosen models, and some of the mathematical rationale/intuition behind these decisions.For that reason, the code (predominantly in R, and Stan) behind these projects is available but by default not shown.
Most of the problems I’m analysing here are in the realm of statistical modelling, which neccessarily brings up the Bayesian/Frequentist debate. I don’t have a formal philosophical stance that one is right and the other is wrong, but my track record indicates that I am infrequently frequentist, or basically Bayesian.
My rationale for leaning towards Bayesian solutions are hardly original, but can be summarised (in decreasing order of significance to me) as:
Bayesian uncertainty (credible) intervals are less prone to misinterpretation than frequentist (confidence) intervals.
Posterior distributions can easily be propogated, enabling us to easily estimate uncertainty for downstream measures of interest.
Real world problems often have small data sets, or large data sets with statistically different sub-populations.I’ve found Bayesian hierarchical modelling to be effective at handling these scenarios.
Deciding whether something is meaningful on the basis of a \(p\)-value is regularly misunderstood, or worse abused.
Its rare that we have no prior knowledge about model parameters. Conversely: normally we have some prior knowledge about parameters, so we should use it… carefully.
For some more nuanced (and authoritative) sources that have lead me to those views:
Richard McElreath has made Chapter 1 of his Statistical Rethinking freely available, and this provides an accessible introduction to the differences in approaches.
Too many posts to mention on andrewgelman.com, but this guest post by Phil Price, and the accompanying comments, is a good starting point.
Frank Harrell’s journey from frequentist to Bayesian statistics.
I’m currently working for the UK Civil Service, as a data scientist within Her Majesties Courts and Tribunals Service (HMCTS) supporting the recovery programme of the courts from the CV19 pandemic.
In previous lives I’ve been the analytical advisor to the senior judiciary of England and Wales, provided analysis for the UK’s EU and International Fisheries negotiations teams ahead of Brexit, and optimised how offenders are allocated to prisons within the English and Welsh prison estate.
Prior to all that I completed a PhD in mathematics at the University of Warwick, specialising in probability theory and statistical physics.