Methinks it is like a weasel

This is my first proper blog post for a while. Apologies for the gap. I have been busy with visits from three of my favourite colleagues (Kevin Knuth, Daniela Huppenkothen, and Dan Foreman-Mackey), followed by teaching an undergraduate course for which I had to learn HTML+CSS, XML, and databases (aside: SQL is cool and I wish I had learned it earlier). Somewhere in there, Lianne and I managed to buy our first house as well. Hopefully that’s enough excuses!

Earlier this year, the physics department had a visit from prominent astrostatistician Daniel Mortlock, who gave a good introductory talk about “Bayesian model selection”. He gave the standard version of the story where the goal is to calculate posterior model probabilities (as opposed to a literal selection of a model, which is a decision theory problem). During the presentation, he claimed that you shouldn’t use this theory to calculate the posterior probability of a hypothesis you only thought of because of the data. I thought this was a weird claim, so I disputed it, which was fun, but didn’t resolve the issue on the spot.

Here’s why I think Mortlock’s advice is wrong. Probabilities measure how plausible a proposition is, in the context of another proposition being known. Equivalently, they measure the degree to which one proposition implies another. For example, a posterior probability P(H|D, I) is the probability of statement H given D and I, or the degree to which D implies H in the context of I. To calculate it, you use Bayes’ rule. The posterior probability of H equals the prior times the likelihood divided by the marginal likelihood. There’s no term in the equation for when or why you thought of H.

Still, I can see why Mortlock would have given his recommendation; it was a warning against the Bayesian equivalent of “p-hacking“. Every dataset will contain some meaningless anomalies, and it’s possible to construct an analysis that makes an anomaly appear meaningful when it isn’t.

A super-simple example will help here (I’ve used this example before, and it’s basically Ed Jaynes’ “sure thing hypothesis”). Consider a lottery with a million tickets. Consider the hypotheses H_0: The lottery was fair, and H_1: the lottery was rigged to make ticket number 227, 354 win. And let D be the proposition that 227, 354 indeed won. The likelihoods are P(D|H_0, I) = 10^{-6} and P(D|H_1, I) = 1. Wow! A strong likelihood ratio in favour of H_1. With prior probabilities of 0.5 each, the posterior probabilities of H_0 and H_1 are 1/1,000,001 and 1,000,000/1,000,001 respectively. Whoa. The lottery was almost certainly rigged!

Common sense says this conclusion is silly, and Mortlock’s warning would have prevented it. Okay, but is there a better way to prevent it? There is. We can assign more sensible prior probabilities. P(H_0 | I) = P(H_1 | I) = 1/2 is silly because it would have implied P(D|I) \approx 1/2, i.e. that we had some reason to suspect ticket number 227, 354 (and assign a 50% probability to it winning) before we knew that was the outcome. If, for example, we had considered a set of “rigged lottery” hypotheses \{H_1, ..., H_{1,000,000}\}, one for each ticket, and divided half the prior probability among them, then we’d have gotten the “obvious” result, that D is uninformative about whether the lottery was fair or not.

The take home message here is that you can use Bayesian inference to calculate the plausibility of whatever hypotheses you want, no matter when you thought of them. The only risk is that you might be inclined to assign bad prior probabilities that sneakily include information from the data. The prior probabilities describe the extent to which hypotheses are implied by the prior information. If they do that, you’ll be fine.

About Brendon J. Brewer

I am a senior lecturer in the Department of Statistics at The University of Auckland. Any opinions expressed here are mine and are not endorsed by my employer.
This entry was posted in Inference, Personal. Bookmark the permalink.

3 Responses to Methinks it is like a weasel

  1. G. Belanger says:

    Jaynes is the master at bringing sensible thinking to physical problems and resolving apparent weaknesses or paradoxes based on faulty thinking. Great post! Thanks.

  2. Tim Vaughan says:

    Hi Brendon, thanks for the great post. The only thing I’d maybe change is the suggestion that the equal weighting of the two hypotheses in the lottery example is merely silly. Assuming that the information “I” is what you’ve provided (that there is a lottery involving a million tickets and that it may or may not be rigged), doesn’t the principle of indifference _mandate_ that P(H_i|I) be equal for all H_i where the indices label hypotheses that differ only by a permutation of ticket labels?

    • I agree the principle of indifference is compelling in this example. In more complicated scenarios, it’s less obvious whether there is such a symmetry in the prior information, and if so, on what “level” the symmetry applies (e.g. sometimes it seems reasonable to assert that a marginal distribution should have a certain symmetry, rather than the full joint distribution having the symmetry).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s