The book I most commonly recommend to physicists and astronomers who want an introduction to Bayesian Inference is “Data Analysis: A Bayesian Tutorial” by Sivia and Skilling. It’s a neat little book that presents things very clearly, perhaps with the exception of the Nested Sampling chapters which are affected by John Skilling’s curse of knowledge.
Another plus for physicists is the examples, which are all physicsy. [For completeness, the book I would recommend to a statistician would be O’Hagan and Forster. Perhaps surprisingly, I haven’t read Gelman, but I probably should.]
Since Sivia and Skilling are both part of the ‘MaxEnt’ community of Bayesian physicists who think along the lines of Ed Jaynes, there is a chapter on MaxEnt in the book. The idea is introduced using the “kangaroo problem”: assign probabilities to a 2 x 2 table of possibilities for the (handedness, eye colour) of a kangaroo. A strong argument for MaxEnt is that if one constraint refers to handedness and the other to eye colour, the assignment of probabilities will be independent (this all assumes a uniform “measure” (I call it a prior) as well).
Unfortunately, this doesn’t seem compelling at all in Sivia’s discussion, because he seems to phrase the problem as though we are estimating a frequency distribution (i.e. trying to infer what fraction of kangaroos are right handed with blue eyes, etc), as opposed to assigning a probability distribution to represent our state of knowledge about one single kangaroo. In the former case, there’s no reason whatsoever to insist on independence and to privilege MaxEnt. The true frequency distribution of kangaroo properties probably isn’t one that factorises that way. And anyway, shouldn’t we acknowledge our uncertainty [calculate a posterior distribution over the space of possible frequency distributions] about the result instead of just choosing a point estimate?
If we really are assigning probabilities for a single kangaroo then MaxEnt is applicable for choosing a probability distribution that agrees with the given constraints. Here, independence is really important. The fact that MaxEnt gives you an independent assignment when one constraint is about handedness and the other is about eye colour is fundamental: any method that did something else would be behaving irrationally. It’s just a shame this important and fundamental property of MaxEnt is obscured by the choice of wording in the book.