When I first moved from a physics department to a statistics department, I was a bit nervous about whether worldview collisions would occur with my coworkers, and how they would play out. From what I’ve heard, things have improved a lot over the last few decades. Bayesians, frequentists, and (by far the most common nowadays) whatever-you-prefer pragmatists, now work together with minimal friction. There was the time someone from my department called all of Bayesian inference “bullshit” at a Steven Novella talk, which was amusing, but that’s far from the typical experience.
Despite the friendly atmosphere, there is still a lot to be debated, and when the topics come up, all sorts of interesting things can happen, such as the following (poorly reconstructed from memory) conversation I had a couple of weeks ago. I’ve done my best not to misrepresent my colleague’s views.
COLLEAGUE: General complaints about “statistical significance” and arbitrary thresholds such as p-values being less than 0.05.
ME: I agree. Additional complaints about people still using p-values in the first place.
COLLEAGUE: Whoa there. The problem isn’t so much with p-values as it is with arbitrary thresholds, publication bias, and all that. P-values aren’t supposed to be used to make hard decisions.
ME: What’s your view on how p-values should be used?
COLLEAGUE: Scientists are interested in whether certain hypotheses are plausible or not, so they do experiments and gather data. The p-value is an intuitive device which measures strength of evidence, and they can use this to shift their attitudes to the hypotheses.
ME: That sounds very Bayesian in spirit.
COLLEAGUE: It is. I just prefer Fisher’s methods, because they’re easier and need less inputs.
ME: I technically agree about less inputs, but the answers to the questions we ask actually depend on those inputs, which is something we should understand and acknowledge. My hunch is that Fisher’s methods are only easier (to some) because we emphasise them too much in our curricula, but that’s an empirical claim which would need investigation.
COLLEAGUE: Scientists aren’t interested in inference per se, like you are, and don’t care about how the answer is sensitive to certain inputs. They just want to know more about their own topic, such as whales, medicines, or galaxies.
ME: Okay. It’s true…I think it’s cool that we can measure the masses of black holes using reverberation mapping (for example). I don’t actually care what the masses turn out to be, but my colleagues do. Different strokes for different folks. But what would you do if some scientists had a problem where the Bayesian conclusion was very sensitive to the priors, and they couldn’t agree on a sensible choice of prior? I’d just present the sensitivity analysis.
COLLEAGUE: I would offer them a p-value instead. Presenting the sensitivity analysis would be needlessly confusing, and only of interest to statisticians. The only justification needed for providing a p-value is that it convinces actual scientists.
This was a fascinating discussion for me, firstly because we found a lot more common ground than I would have expected, yet ended up finishing on a point of fundamental disagreement.
Scientists are interested in the plausibility of various hypotheses, and come to statisticians because we have technical expertise in that area. Upon finding out that the plausibility of a hypothesis (e.g. “the medicine is equivalent to a placebo”) depends on an uncomfortably subjective input (e.g. “without knowing the data, how plausible would it be that the medicine has a tiny/small/moderate/large effect size”), the appropriate response is full disclosure, not muddying the waters. Switching to a p-value is a form of bait-and-switch advertising. The customer asked for a bicycle, and we sold them a unicycle instead, on the basis “hey, at least you’ll only need one wheel!”.
The fact that it convinces real scientists is also a bit worrysome. Scientists are supposed to use logical reasoning. If a scientist thinks a “null hypothesis” is implausible because of a low p-value, without reflecting on how small the effect size might be if it’s nonzero, they are reasoning illogically and missing a learning opportunity. Why does the result depend on this input in this way? If we can’t agree on what the inputs should be, where are we allowed to look for more information? These questions are not particularly difficult, and they would become easier over time if we spend more of our curricula teaching our students about their existence.