This is my first proper blog post for a while. Apologies for the gap. I have been busy with visits from three of my favourite colleagues (Kevin Knuth, Daniela Huppenkothen, and Dan Foreman-Mackey), followed by teaching an undergraduate course for which I had to learn HTML+CSS, XML, and databases (aside: SQL is cool and I wish I had learned it earlier). Somewhere in there, Lianne and I managed to buy our first house as well. Hopefully that’s enough excuses!
Earlier this year, the physics department had a visit from prominent astrostatistician Daniel Mortlock, who gave a good introductory talk about “Bayesian model selection”. He gave the standard version of the story where the goal is to calculate posterior model probabilities (as opposed to a literal selection of a model, which is a decision theory problem). During the presentation, he claimed that you shouldn’t use this theory to calculate the posterior probability of a hypothesis you only thought of because of the data. I thought this was a weird claim, so I disputed it, which was fun, but didn’t resolve the issue on the spot.
Here’s why I think Mortlock’s advice is wrong. Probabilities measure how plausible a proposition is, in the context of another proposition being known. Equivalently, they measure the degree to which one proposition implies another. For example, a posterior probability is the probability of statement given and , or the degree to which implies in the context of . To calculate it, you use Bayes’ rule. The posterior probability of equals the prior times the likelihood divided by the marginal likelihood. There’s no term in the equation for when or why you thought of .
Still, I can see why Mortlock would have given his recommendation; it was a warning against the Bayesian equivalent of “p-hacking“. Every dataset will contain some meaningless anomalies, and it’s possible to construct an analysis that makes an anomaly appear meaningful when it isn’t.
A super-simple example will help here (I’ve used this example before, and it’s basically Ed Jaynes’ “sure thing hypothesis”). Consider a lottery with a million tickets. Consider the hypotheses : The lottery was fair, and : the lottery was rigged to make ticket number 227, 354 win. And let be the proposition that 227, 354 indeed won. The likelihoods are and . Wow! A strong likelihood ratio in favour of . With prior probabilities of 0.5 each, the posterior probabilities of and are 1/1,000,001 and 1,000,000/1,000,001 respectively. Whoa. The lottery was almost certainly rigged!
Common sense says this conclusion is silly, and Mortlock’s warning would have prevented it. Okay, but is there a better way to prevent it? There is. We can assign more sensible prior probabilities. is silly because it would have implied , i.e. that we had some reason to suspect ticket number 227, 354 (and assign a 50% probability to it winning) before we knew that was the outcome. If, for example, we had considered a set of “rigged lottery” hypotheses , one for each ticket, and divided half the prior probability among them, then we’d have gotten the “obvious” result, that is uninformative about whether the lottery was fair or not.
The take home message here is that you can use Bayesian inference to calculate the plausibility of whatever hypotheses you want, no matter when you thought of them. The only risk is that you might be inclined to assign bad prior probabilities that sneakily include information from the data. The prior probabilities describe the extent to which hypotheses are implied by the prior information. If they do that, you’ll be fine.