Ed Jaynes frequently criticised what he called the *mind projection fallacy. *This is the implicit assumption that our concepts of, say, gaussianity, randomness, and so on, reflect properties of to be found “out there in nature”, whereas they often reside only in the structure of our thinking. I’ve brought this up a few times on the blog, in old posts such as Stochastic and Testing Model Assumptions.

Learning basic Haskell has given me another way to think and talk about this issue without getting mired in metaphysics or discussions about subjectivity and objectivity. This post about random sequences is an instance of what I’m talking about in this post.

Haskell is a strongly typed language, like C++. Whenever you define something, you always have to say what type of thing it is. This is pretty much the same idea as in mathematics when you say what set something is in. For example, in the following code `x`

is an integer and `f`

is a function that takes two integers as input and returns a string. Each definition is preceded by a *type signature* which just describes the type. Functions and variables (i.e., constants ;-)) both have types.

x :: Int x = 5 f :: Int -> Int -> String f x y = if x > y then "Hello" else "Goodbye"

Jared wrote his PhD dissertation about how Haskell’s sophisticated type system can be used to represent probability distributions in a very natural way. For example, in your program you might have a type `Prob Int`

to represent probability distributions over `Int`

s, `Prob Double`

to represent probability distributions over `Double`

s, and so on. The `Prob`

type has a type parameter, like templates in C++ (e.g., `std::vector<T>`

defines vectors of all types, and `std::vector<int>`

` and `

`std::vector<double>`

have definite values for the type parameter `T`

). The `Prob`

type has instances for `Functor`

, `Applicative`

, and `Monad`

for those who know what that means. It took me about six months to understand this and now that I roughly do, it all seems obvious. But hopefully I can make my point without much of that stuff.

Many instances of the mind projection fallacy turn out to be simple type errors. More specifically, they are attempts to apply functions which require an argument of type `Prob a`

to input that’s actually of type `a`

. For example, suppose I define a function which tests whether a distribution over a vector of doubles is gaussian or not:

isGaussian :: Prob (Vector Double) -> Bool

If I tried to apply that to a `V.Vector Double`

, I’d get a compiler error. There’s no mystery to that, or need for any confusion as to why you can’t do it. It’s like trying to take the logarithm of a string.

Some of the confusion is understandable because our theories have many functions in them, with some functions being similar to each other in name and concept. For example, I might have two similarly-named functions `isGaussian1`

and `isGaussian2`

, which take a `Prob (Vector Double)`

and a `Vector Double`

as input respectively:

isGaussian1 :: Prob (Vector Double) -> Bool isGaussian1 pdf = if pdf is a product of iid gaussians then True else False isGaussian2 :: Vector Double -> Bool isGaussian2 vec = if a histogram of vec's values looks gaussian then True else False

It would be an instance of the mind projection fallacy if I started mixing these two functions up willy-nilly. For example if I say “my statistical test assumed normality, so I looked at a histogram of the data to make sure that assumption was correct”, I am jumping from `isGaussian1`

to `isGaussian2`

mid-sentence, assuming that they are connected (which is often true) and that the connection works in the way I guessed it does (which is often false).

Here’s another example using statistics notions. I could define two standard deviation functions, for the so-called ‘population’ and ‘sample’ standard deviations:

sd1 :: Prob Double -> Double sd2 :: Vector Double -> Double

Statisticians are typically good at keeping these conceptually distinct, since they learn them at the very beginning of their training. On the other hand, here’s an example that physicists tend to get mixed up about:

entropy :: Prob Microstate -> Double disorder :: Microstate -> Double

While there is no standard definition of disorder, no definition can make it equivalent to Gibbs/Shannon entropy, which is a property of the probability distribution, not the state itself. Physicists with their concepts straight should recognise the compiler error right away.

I mentioned above that `Prob`

has a `Functor`

instance. That means you can take a function with type `a -> b`

and apply it to a `Prob a`

, yielding a `Prob b`

(this is analogous to how you might apply a function to every element of a vector, yielding a vector output where every element has the new type). The interpretation in probability is that you get the probability distribution for the transformed version of the input variable. So, any time you have a function `f :: a -> b`

, you also get a function `fmap f :: Prob a -> Prob b`

. This doesn’t get you a way around the mind projection fallacy, though. For that, the type would need to be `Prob a -> b`

.

Here’s a way in which entropy can sort of be a function of the microstate, in a particular way. First, set up the prior over microstates x. Then, imagine conditioning on some function of x such as the energy E(x). This would induce a posterior over x, which has an entropy. That entropy of the posterior can be considered as a function of E, which is also a function of x. So you can set up a function that takes an x to an entropy, through intermediate stages involving a probability distribution. However, this isn’t what people mean when they blithely say things like “entropy measures disorder”.