Over on the Bayesian Philosophy blog, there is an interesting short post discussing whether non-informative probability distributions exist. There are several different ways to get from intuitions about informativeness to formal principles (e.g. the principle of indifference, transformation groups, MaxEnt, Jeffreys priors, reference priors, default priors). Since all of these ideas are prone to misuse, it’s tempting to declare them dead ends, but I think this is a mistake. I think the first three principles in the list are fundamentally important, and the others are at least worth thinking about.
I have previously criticised notions of “complete ignorance”, or of having “no prior information”, which I would like to distinguish from the above principles. However, regardless of the arguments for or against non-informative probability distributions, it’s worth noticing how frequently they are used in practice. You may not see mention of entropy maximisation or transformation groups or symmetry when you take a probability course or read about a data analysis. But a lot of what you do see can be derived from one or more of the above principles. Here are some common disguises:
i) The phrase “at random”. For example, there are 10 balls in an urn and one is selected at random. This is a disguise for the principle of indifference.
ii) The common practice of equating a probability to a frequency can be derived by applying the principle of indifference on an appropriate hypothesis space. For example, if 2% of people in Auckland have disease X, the probability this particular person has disease X is also 2%. This is what you’d get by enumerating the set of all Aucklanders and applying indifference over that set.
iii) Exponential family conditional priors (‘sampling distributions’). e.g. normal, poisson, exponential, etc, can be derived fairly straightforwardly from MaxEnt.
iv) Uniform priors (or related approximations such as maximum likelihood), as well as log-uniform priors, are very popular. Most of the listed principles produce these in some circumstances.
v) Assumptions of independence, such as the “iid” conditional priors you see everywhere. People use these even when all sorts of correlated distributions are available. Independence is what MaxEnt will produce if no input information links a proposition A to another B. Independence is ignorance. If knowing B can change your probability for A, you are informed.
None of this proves anything about the merits or drawbacks of proposed principles for assigning probabilities. What it does show is that deeper reasons are available for a lot of the standard things everyone does. So if we decide we want to improve on the standard things, we have somewhere to look.