The book I most commonly recommend to physicists and astronomers who want an introduction to Bayesian Inference is “Data Analysis: A Bayesian Tutorial” by Sivia and Skilling. It’s a neat little book that presents things very clearly, perhaps with the exception of the Nested Sampling chapters which are affected by John Skilling’s curse of knowledge.

Another plus for physicists is the examples, which are all physicsy. [For completeness, the book I would recommend to a statistician would be O’Hagan and Forster. Perhaps surprisingly, I haven’t read Gelman, but I probably should.]

Since Sivia and Skilling are both part of the ‘MaxEnt’ community of Bayesian physicists who think along the lines of Ed Jaynes, there is a chapter on MaxEnt in the book. The idea is introduced using the “kangaroo problem”: assign probabilities to a 2 x 2 table of possibilities for the (handedness, eye colour) of a kangaroo. A strong argument for MaxEnt is that if one constraint refers to handedness and the other to eye colour, the assignment of probabilities will be independent (this all assumes a uniform “measure” (I call it a prior) as well).

Unfortunately, this doesn’t seem compelling at all in Sivia’s discussion, because he seems to phrase the problem as though we are *estimating a frequency distribution* (i.e. trying to infer what fraction of kangaroos are right handed with blue eyes, etc), as opposed to *assigning a probability distribution to represent our state of knowledge about one single kangaroo*. In the former case, there’s no reason whatsoever to insist on independence and to privilege MaxEnt. The true frequency distribution of kangaroo properties probably *isn’t* one that factorises that way. And anyway, shouldn’t we acknowledge our uncertainty [calculate a posterior distribution over the space of possible frequency distributions] about the result instead of just choosing a point estimate?

If we really are assigning probabilities for a single kangaroo then MaxEnt is applicable for choosing a probability distribution that agrees with the given constraints. Here, independence is really important. The fact that MaxEnt gives you an independent assignment when one constraint is about handedness and the other is about eye colour is fundamental: any method that did something else would be behaving irrationally. It’s just a shame this important and fundamental property of MaxEnt is obscured by the choice of wording in the book.

### Like this:

Like Loading...

*Related*

## About Brendon J. Brewer

I am a senior lecturer in the Department of Statistics at The University of Auckland. Any opinions expressed here are mine and are not endorsed by my employer.

Maximum entropy is a worthwhile principle, but statistical problems are, in my opinion, so varied that “the toolbox” for dealing with them needs to be richer than some practitioners imagine. In particular, I highly urge students of Bayesian problems to embrace and employ hierarchical models.

To that end, there are four texts I recommend.

First, John Kruschke,

Doing Bayesian Data Analysis, now in its second edition.Second, Lunn, Jackson, Best, Thomas, Spiegelhalter,

The BUGS Book.Third, Peter Condgon’s series of textbooks, from

Bayesian Statistical ModellingtoApplied Bayesian ModellingtoBayesian Models for Categorical Data.Fourth, the third edition of

Bayesian Data Analysis, now by Gelman, Carlin, Stern, Dunson Vehtari, and Rubin.Also, this is a

rapidlychanging field. Y’need to pay attention to new stuff onarXiv.org. There’s also alotof neat stuff not in the main fields, but key and profound, in my opinion, e.g., S. N. Wood, “Statistical inference for noisy nonlinear ecologicaldynamic systems”,

Nature,466(26), August 2010, http://dx.doi.org/10.1038/nature09319, and Jasra, Kantas, Ehrlick “Approximate Inference for Observation Driven Time Series Models with Intractable Likelihoods”,ACM Transactions on Modeling and Computer Simulation, 24(3), 2014, and Martin, McCabe, Maneesoonthorn, Robert, “Approximate Bayesian Computation in State Space Models”, http://arxiv.org/abs/1409.8363.I read through the section on Poisson statistics in Sivia and thoroughly disappointed by the fact that the authors promote the use of normal statistics instead of simply using the correct Poisson likelihood. This was enough for me to put down the book and not open it again. I would, therefore, encourage you to look for another book to commonly recommend to people. Maybe one of those mentioned by the hypergeometric would be a good candidate.