Much Ado About Nothing

I just met up with a new student of mine, and gave her some warmup questions to get familiar with some of the things we’ll be working on. This involved “differential entropy”, which is basically Shannon entropy but for continuous distributions.

H = -\int f(x) \log f(x) \, dx

An intuitive interpretation of this quantity is, loosely speaking, the generalisation of “log-volume” to non-uniform distributions. If you changed parameterisation from x to x’, you’d stretch the axes and end up with different volumes, so H is not invariant under changes of coordinates. When using this quantity, a notion of volume is implicitly brought in, based on a flat measure that would not remain flat under arbitrary coordinate changes. In principle, you should give the measure explicitly (or use relative entropy/KL divergence instead), but it’s no big deal if you don’t, as long as you know what you’re doing.

For some reason, in this particular situation only, people wring their hands over the lack of invariance and say this makes the differential entropy wrong or bad or something. For example, the Wikipedia article states

Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continuous probability distributions. Unfortunately, Shannon did not derive this formula, and rather just assumed it was the correct continuous analogue of discrete entropy, but it is not.[citation needed] The actual continuous version of discrete entropy is the limiting density of discrete points (LDDP).

(Aside: this LDDP thing comes from Jaynes, who I think was awesome, but that doesn’t mean that he was always right or that people who’ve read him are always right)

Later on, the Wikipedia article suggests something is strange about differential entropy because it can be negative, whereas discrete Shannon entropy is non-negative. Well, volumes can be less than 1, whereas counts of possibilities cannot be. Scandalous!

This is probably a side effect of the common (and unnecessary) shroud of mystery surrounding information theory. Nobody would be tempted to edit the Wikipedia page on circles to say things like “the area of a circle is not really \pi r^2, because if you do a nonlinear transformation the area will change.” On second thoughts, many maths Wikipedia pages do degenerate into fibre bundles rather quickly, so maybe I shouldn’t say nobody would be tempted.

About Brendon J. Brewer

I am a senior lecturer in the Department of Statistics at The University of Auckland. Any opinions expressed here are mine and are not endorsed by my employer.
This entry was posted in Entropy, Information. Bookmark the permalink.

2 Responses to Much Ado About Nothing

  1. G. Belanger says:

    Hi Brendon! I assume that you have already corrected and expanded the Wikipedia page that is the subject of this post. If not, you really should! The reliability of Wikipedia depends on people who can actually correct mistakes when they identify them.

    • Brendon J. Brewer says:

      I’d probably be getting myself into a pointless fight if I tried since this is largely a matter of interpretation and taste. Maybe when I finish my book I can use it as a reference to bolster any edits.

Leave a Reply to G. Belanger Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s