We’re usually taught that it’s important to test model assumptions. For example, if your inference assumes the probability distribution for the data is a normal/gaussian distribution, then you should look at a histogram of the data to see if it actually looks like a gaussian. If it doesn’t, then you have a large risk of getting wrong or misleading answers. Or so we’re told.
Consider the following simple example. There is a parameter which we want to learn from 1000 data values , whose sampling distribution is independently for each data point. As I have argued elsewhere, the sampling distribution is really a prior for the data given the parameters, and has the same logical status as any other prior distribution. This prior doesn’t imply that a histogram of the -values has to look like a gaussian, but this hypothesis (call it “gaussianity”) does have quite a high prior probability.
With an improper uniform prior for , the posterior for given the data is also a gaussian with mean (i.e. the arithmetic mean of the data), and standard deviation . What we usually want to see is that the posterior density is high at the true value of . Let’s say that if the true value of is within 1.96 standard deviations of the peak of the posterior, then our inference has been “successful”. The prior probability of “success”, according to the joint prior (uniform for times gaussian for the data given ) is 95 %.
An important question is whether there is any relationship between “gaussianity” of the data, and whether the data would yield a “successful” posterior distribution. According to folklore, there should be a strong relationship: if the data are highly non-gaussian, then our inferences should be suspect, right? Well we can test that. We know the prior probability of success is 95%, but what’s the prior probability of success given non-gaussianity? To compute this I simulated fake parameter values and datasets from the joint prior, and calculated whether the data set was gaussian (I used a classical test to do this, and called the data non-gaussian if the p-value was below 0.0001.), and whether the inference was successful. Counting the fraction of inferences that were successful when the data were deemed non-gaussian gives us the result,
wait for it…
basically 95%! That’s right, the proposition that “the data look gaussian” has nothing whatsoever to do with whether the inference is successful or not! Therefore doing such a ‘check’ is basically superstition. To test this further I reduced the p-value threshold to (I had to use Nested Sampling to compute the results), and the results were unchanged. So what does produce a successful inference? In this example it’s easy to say. Since the posterior only depends on the arithmetic mean of the data, any data set whose mean is close to the true value of will result in success – no matter how the data points are actually “distributed”.
Real inference problems are more complicated than this example. Of course it is possible to get misleading results when there are problems with your priors. The way to improve this situation is to put more thought into your priors, not to apply superstitions.