An article about experts, and joining Heterodox Academy

Any followers of this blog who aren’t on Facebook or Twitter might be interested in this article I wrote about expertise and its limits.

In related news, I have signed up as a supporter of Heterodox Academy, a group advocating viewpoint diversity in academia as a bulwark against groupthink both within specific research areas and in the greater academic community. If you know me, you probably know that I used to be quite left-wing and a supporter of all the popular causes that academics tend to be enthusiastic about. Over the last 18 months I have become disillusioned and frustrated with much of this. If someone says something popular but false (or at least debateable), it shouldn’t require bravery to openly question it. But it does, and that’s the opposite of a good strategy for pursuing truth.

Posted in Personal | 6 Comments

A Co-Blogger!

Luke Barnes reminds readers of his blog Letters to Nature that it’s not just his blog. 🙂

Posted in Uncategorized | Leave a comment

Methinks it is like a weasel

This is my first proper blog post for a while. Apologies for the gap. I have been busy with visits from three of my favourite colleagues (Kevin Knuth, Daniela Huppenkothen, and Dan Foreman-Mackey), followed by teaching an undergraduate course for which I had to learn HTML+CSS, XML, and databases (aside: SQL is cool and I wish I had learned it earlier). Somewhere in there, Lianne and I managed to buy our first house as well. Hopefully that’s enough excuses!

Earlier this year, the physics department had a visit from prominent astrostatistician Daniel Mortlock, who gave a good introductory talk about “Bayesian model selection”. He gave the standard version of the story where the goal is to calculate posterior model probabilities (as opposed to a literal selection of a model, which is a decision theory problem). During the presentation, he claimed that you shouldn’t use this theory to calculate the posterior probability of a hypothesis you only thought of because of the data. I thought this was a weird claim, so I disputed it, which was fun, but didn’t resolve the issue on the spot.

Here’s why I think Mortlock’s advice is wrong. Probabilities measure how plausible a proposition is, in the context of another proposition being known. Equivalently, they measure the degree to which one proposition implies another. For example, a posterior probability P(H|D, I) is the probability of statement H given D and I, or the degree to which D implies H in the context of I. To calculate it, you use Bayes’ rule. The posterior probability of H equals the prior times the likelihood divided by the marginal likelihood. There’s no term in the equation for when or why you thought of H.

Still, I can see why Mortlock would have given his recommendation; it was a warning against the Bayesian equivalent of “p-hacking“. Every dataset will contain some meaningless anomalies, and it’s possible to construct an analysis that makes an anomaly appear meaningful when it isn’t.

A super-simple example will help here (I’ve used this example before, and it’s basically Ed Jaynes’ “sure thing hypothesis”). Consider a lottery with a million tickets. Consider the hypotheses H_0: The lottery was fair, and H_1: the lottery was rigged to make ticket number 227, 354 win. And let D be the proposition that 227, 354 indeed won. The likelihoods are P(D|H_0, I) = 10^{-6} and P(D|H_1, I) = 1. Wow! A strong likelihood ratio in favour of H_1. With prior probabilities of 0.5 each, the posterior probabilities of H_0 and H_1 are 1/1,000,001 and 1,000,000/1,000,001 respectively. Whoa. The lottery was almost certainly rigged!

Common sense says this conclusion is silly, and Mortlock’s warning would have prevented it. Okay, but is there a better way to prevent it? There is. We can assign more sensible prior probabilities. P(H_0 | I) = P(H_1 | I) = 1/2 is silly because it would have implied P(D|I) \approx 1/2, i.e. that we had some reason to suspect ticket number 227, 354 (and assign a 50% probability to it winning) before we knew that was the outcome. If, for example, we had considered a set of “rigged lottery” hypotheses \{H_1, ..., H_{1,000,000}\}, one for each ticket, and divided half the prior probability among them, then we’d have gotten the “obvious” result, that D is uninformative about whether the lottery was fair or not.

The take home message here is that you can use Bayesian inference to calculate the plausibility of whatever hypotheses you want, no matter when you thought of them. The only risk is that you might be inclined to assign bad prior probabilities that sneakily include information from the data. The prior probabilities describe the extent to which hypotheses are implied by the prior information. If they do that, you’ll be fine.

Posted in Inference, Personal | 3 Comments

Second article on Quillette

For anyone who missed it, I had another article published in the online science & politics magazine Quillette. In this one, I describe Ed Jaynes’ view of the second law of thermodynamics, and how it’s really little more than the sum rule of probability theory.

Posted in Uncategorized | 1 Comment

The probability of a Mormon second coming

In a recent episode of his podcast, author Sam Harris reiterated an observation about probability theory. The broader context was to criticize the popular notion that all religions are the same. They aren’t — some specific propositions associated with religions are more plausible than others, and their consequences if believed and acted upon also vary. The probabilistic point was that the second coming of Jesus envisioned by Mormons is ‘objectively less plausible’ than a generic Christian version. Commentator Cenk Uygur then responded, saying that this is nonsense because the probability of both is zero if atheism is true (he also replaced the generic Christian version with a specific Christian version so he was talking about different propositions from Harris). The purpose of this post won’t surprise my readers: I’m going to pick nits about what probability theory actually says.

If we consider the proposition, associated with more traditional versions of Christianity, that Jesus will return to Earth to judge the living and the dead, and label this proposition A, then given information this has probability P(A | I) (the probability of A given I).

Now consider the proposition, associated with Mormonism, that Jesus will return to Earth to judge the living and the dead and this will occur in the US state of Missouri. The first part of this proposition is A, but a second proposition B (about it happening in Missouri) has been attached via the and operator. Given information I, the probability is P(A, B | I).

Probability theory says that P(A, B | I) \leq P(A | I) for any propositions A, B, and I. Applying it to our case, the probability of the Mormon proposition must be less than or equal to the probability of the Christian one. I put “or equal to” in bold because it’s the nit I want to pick in Harris’s original statement. Probability theory itself says \leq. I think any reasonable person would assign probabilities such that the strict inequality < applies, but that’s not a property of every possible probability assignment.

That adding extra stipulations with and can only decrease the plausibility (or keep it the same) isn’t just a consequence of probability theory, it’s a core part of the arguments for why probability applies to rational degrees of plausibility in the first place.

What happens if we do as Uygur did, and consider another proposition C, which is like B but specifies the location as Jerusalem instead of Missouri? Then probability theory in itself doesn’t constrain the values of P(A, B | I) and P(A, C | I). However, I’d assign a greater, but still small, probability to the latter.

Now what happens if we do another thing Uygur did, which is assert that anyone associated with the worth atheist (even though they do not like the term and would prefer it went away) should assign precisely zero probability to all of these propositions? Nothing much changes about the above discussion. Define I_2 as the proposition God doesn’t exist and Jesus was a regular person and will never return. Then, as probability theory requires P(A, B | I_2) \leq P(A | I_2). It just so happens that both are zero (again, given I_2), and it’s the equality part of \leq that applies.

Posted in Inference | 2 Comments

Come work with us!

Our department is hiring. The purpose of this post is simply to share my colleague Thomas Lumley’s post. Please share with anyone who might be interested in applying!

Posted in Uncategorized | Leave a comment

Article in Quillette + Technical Details

Followers of my blog who haven’t already seen it might enjoy this opinion piece I wrote for Quillette magazine. I have wanted to write something like that for a while, but never thought I’d get around to it. I’m glad I proved myself wrong!

Here are the technical details of the calculations I did to get the results in the article. If you’re comfortable with an analytical derivation, see this one by Jared Tobin.

For both the Bayesian and frequentist calculations I used a binomial conditional prior for the data (aka likelihood) for the number of recoveries x:

p(x | \theta) \sim \textnormal{Binomial}(100, \theta).

For the Bayesian analysis the prior for \theta was a 50/50 mixture of a uniform prior from 0 to 1, and a delta function at \theta=0.7 (the old drug’s effectiveness):

p(\theta) = \frac{1}{2}\delta(\theta - 0.7) + \frac{1}{2}

where \theta \in [0, 1]. Is this prior debatable? Yes, just like the prior for the data. The conclusion of an argument can change if you change the premises.

Here’s R code to calculate the posterior:

# Theta values, prior, and likelihood function
# theta[7000] is basically 0.7
theta = seq(0, 1, length.out=10001)
likelihood = dbinom(83, prob=theta, size=100)

# Prior (discrete approx)
prior = rep(0.5/10000, 10001)
prior[7000] = 0.5

# Posterior
posterior = prior*likelihood/sum(prior*likelihood)

# Probability new drug is worse, same, better

Here is R code for the (two-sided) p-value:

# Possible data sets
x = seq(0, 100)

# p(x | H0)
p = dbinom(x, prob=0.7, size=100)

# P(x <= 57 | H0)
prob1 = sum(p[x <= 57])

# P(x >= 83 | H0)
prob2 = sum(p[x >= 83])

# Print p-value
print(prob1 + prob2)
Posted in Inference, Personal | 1 Comment