## Social media is a flawed concept

I enjoyed this critique on the Lunduke Show, which I found via LBRY.

https://spee.ch/@Lunduke:e/social-media-is-a-flawed-concept

## Types and Jaynes’s ‘Mind Projection Fallacy’

Ed Jaynes frequently criticised what he called the mind projection fallacy. This is the implicit assumption that our concepts of, say, gaussianity, randomness, and so on, reflect properties of to be found “out there in nature”, whereas they often reside only in the structure of our thinking. I’ve brought this up a few times on the blog, in old posts such as Stochastic and Testing Model Assumptions.

Learning basic Haskell has given me another way to think and talk about this issue without getting mired in metaphysics or discussions about subjectivity and objectivity. This post about random sequences is an instance of what I’m talking about in this post.

Haskell is a strongly typed language, like C++. Whenever you define something,  you always have to say what type of thing it is. This is pretty much the same idea as in mathematics when you say what set something is in. For example, in the following code `x` is an integer and `f` is a function that takes two integers as input and returns a string. Each definition is preceded by a type signature which just describes the type. Functions and variables (i.e., constants ;-)) both have types.

```x :: Int
x = 5

f :: Int -> Int -> String
f x y = if x > y then "Hello" else "Goodbye"
```

Jared wrote his PhD dissertation about how Haskell’s sophisticated type system can be used to represent probability distributions in a very natural way. For example, in your program you might have a type `Prob Int` to represent probability distributions over `Int`s, `Prob Double` to represent probability distributions over `Double`s, and so on. The `Prob` type has a type parameter, like templates in C++ (e.g., `std::vector<T>` defines vectors of all types, and `std::vector<int>` and `std::vector<double>` have definite values for the type parameter `T`). The `Prob` type has instances for `Functor`, `Applicative`, and `Monad` for those who know what that means. It took me about six months to understand this and now that I roughly do, it all seems obvious. But hopefully I can make my point without much of that stuff.

Many instances of the mind projection fallacy turn out to be simple type errors. More specifically, they are attempts to apply functions which require an argument of type `Prob a` to input that’s actually of type `a`. For example, suppose I define a function which tests whether a distribution over a vector of doubles is gaussian or not:

```isGaussian :: Prob (Vector Double) -> Bool
```

If I tried to apply that to a `V.Vector Double`, I’d get a compiler error. There’s no mystery to that, or need for any confusion as to why you can’t do it. It’s like trying to take the logarithm of a string.

Some of the confusion is understandable because our theories have many functions in them, with some functions being similar to each other in name and concept. For example, I might have two similarly-named functions `isGaussian1` and `isGaussian2`, which take a `Prob (Vector Double)` and a `Vector Double` as input respectively:

```isGaussian1 :: Prob (Vector Double) -> Bool
isGaussian1 pdf = if pdf is a product of iid gaussians then True else False

isGaussian2 :: Vector Double -> Bool
isGaussian2 vec = if a histogram of vec's values looks gaussian then True else False
```

It would be an instance of the mind projection fallacy if I started mixing these two functions up willy-nilly. For example if I say “my statistical test assumed normality, so I looked at a histogram of the data to make sure that assumption was correct”, I am jumping from `isGaussian1` to `isGaussian2` mid-sentence, assuming that they are connected (which is often true) and that the connection works in the way I guessed it does (which is often false).

Here’s another example using statistics notions. I could define two standard deviation functions, for the so-called ‘population’ and ‘sample’ standard deviations:

```sd1 :: Prob Double -> Double

sd2 :: Vector Double -> Double
```

Statisticians are typically good at keeping these conceptually distinct, since they learn them at the very beginning of their training. On the other hand, here’s an example that physicists tend to get mixed up about:

```entropy :: Prob Microstate -> Double

disorder :: Microstate -> Double
```

While there is no standard definition of disorder, no definition can make it equivalent to Gibbs/Shannon entropy, which is a property of the probability distribution, not the state itself. Physicists with their concepts straight should recognise the compiler error right away.

I mentioned above that `Prob` has a `Functor` instance. That means you can take a function with type `a -> b` and apply it to a `Prob a`, yielding a `Prob b` (this is analogous to how you might apply a function to every element of a vector, yielding a vector output where every element has the new type). The interpretation in probability is that you get the probability distribution for the transformed version of the input variable. So, any time you have a function `f :: a -> b`, you also get a function `fmap f :: Prob a -> Prob b`. This doesn’t get you a way around the mind projection fallacy, though. For that, the type would need to be `Prob a -> b`.

Posted in Computing, Inference, Mathematics | 1 Comment

## That’s not what I meant…

A lot of my research over the years has involved fitting images. Usually, I use the traditional assumption that the conditional prior for the data pixels given the model pixels is iid normal with mean zero. But in some cases one might want to use a correlated distribution instead. That’s a technical challenge which I’ve been trying to solve from several angles (though good solutions might be well known to people who know these things).

I got one of those working the other day, and was rather amused at some of the output. Check out these residuals of one posterior sample of a gravitational lens fit (bottom right). The “correlated noise” likelihood is quite happy to call this a decent fit! 🙂

## Wasteful products

A relative of mine recently remarked that consumer products involving less waste (that is, less plastic packaging and so on) usually cost more. The same is true when you compare electric cars to petrol cars, and I’m sure you can think of many other examples. But are we missing something here?

Suppose I can choose between brand A and brand B for my purchases of some product, and both meet my purposes equally well. Suppose also that A comes with lots of plastic packaging and costs \$10, and B does not, yet costs \$13. If I don’t care about the plastic (both in its convienient or annoying effects on me, or its real or imagined environmental impact once discarded), it would be a bad decision to buy product B, at least on my side of the transaction. I could use the \$3 saved by buying A for something else I need, or even donate it to the Against Malaria Foundation.

On the other hand, if we think that the harm caused by discarding this particular unit of plastic packaging is worth the sacrifice of the \$3, then purchasing B would be the right decision. In order to really decide, we would somehow have to know the extent of this impact and to what extent someone does value it, or ought to value it (that is, weigh it against the benefits of the \$3). This is the kind of situation that Economics 101 usually presents with its solution of a Pigouvian tax to offset the external cost, but in my story I’ve put that decision in the hands of the purchaser (Economics 101 doesn’t usually mention the fact that the people setting the level of the tax might not have every incentive to do so accurately, if it is even possible to define that).

However, here’s a tempting line of reasoning that I think is wrong. We should buy product B because it’s less wasteful, in that it takes fewer resources to produce. Just look at the wasteful packaging A has and B doesn’t have! But this neglects the fact that the price is conveying information about things we could rarely learn in all their complexity, such as how valuable the alternative uses are of the resources that went into producing A and B. If B is more expensive, then it is very plausible that it takes more resources to produce it. Perhaps it takes some scarce specialised ingredient, or highly skilled labour (which also has alternative uses), etc. I am reminded of a time a friend lamented the tremendous resources going into producing plastic cutlery, which is so resource intensive that you can buy huge packets of it for a few dollars. If plastic cutlery were truly ridiculously wasteful, you’d be having trouble budgeting for it and find the need to cut back. But most people don’t have to worry that much about their plastic cutlery expenditure.

In the long run it would be exciting if electric cars became cheaper than petrol ones, but for now they aren’t, which means they almost certainly are taking more resources to produce. The external costs here might justify the difference, and many people make that case (more catastrophic views of climate change would imply this, while it’s-one-issue-among-many views would not). But the case isn’t made by pointing at the lack of a tailpipe and forgetting to weigh that against the thousands of dollars at the other end of the transaction. Similarly, if a kind of paper costs twice as much, a business it putting itself at slightly higher risk of going bankrupt by purchasing that kind of paper. Ironically, pursuing environmental sustainability at any price (literally) can mean both wasting more resources and making activities less sustainable.

## Some software I’ve been working on

Hi everyone. I’ve recently been expanding my computational horizons and it has helped me to put some of my trans-dimensional Bayesian inference codes for astronomy into a more usable state for others, with configuration files in YAML, nonzero documentation (though still not thorough), and things like that. I might eventually write up software paper(s) if one becomes popular.

As usual my methods are not the fastest and should be used on smaller datasets where you care about being a purist and getting the most out of it that you can.

Here are a few links:

StarStudded: Probabilistic catalogs for crowded stellar fields https://github.com/eggplantbren/StarStudded

Lensing2: Gravitational lens reconstruction with substructures https://github.com/eggplantbren/Lensing2

Celery: Recover quasi-periodic oscillations from a time series (think asteroseismology) https://github.com/eggplantbren/Celery

Any feedback is appreciated!

## DNest4 version 0.2.0

G’day everyone. This post is to update you on some work I’ve been doing (along with some collaborators, notably João Faria) on DNest4. So, you’ll likely only care if you’re a user of that package. There have been enough additions recently that I’ve decided to call it a new version, 0.2.0. Everything is backwards compatible, so don’t stress. But some of the new features might be useful to you.

What are they? Here’s a partial list:

### Probability Distribution Classes

There is now a selection of classes available to represent some standard-ish probability distributions. This can make it easier to implement the `from_prior` and `perturb` member functions in a model. For example, suppose you want a normal prior centered at 30 with a standard deviation of 10. In the past, you had to implement it like this:
``` #include ```

``` void MyModel::from_prior(DNest4::RNG& rng) {     x = 30.0 + 10.0*rng.randn(); } double MyModel::perturb(DNest4::RNG& rng) {     double logH = 0.0;     logH -= -0.5*pow((x - 30.0)/10.0, 2);     x += rng.randh();     logH += -0.5*pow((x - 30.0)/10.0, 2); ```

```    return logH; } ```

Now you can use the class `DNest4::Gaussian`, an object of which represents a gaussian (normal) distribution. An alternative to the above is then:
``` #include #include "DNest4/code/Distributions/Gaussian.h"```

``` void MyModel::from_prior(DNest4::RNG& rng) {     DNest4::Gaussian gaussian(30.0, 10.0);     x = gaussian.generate(rng); } ```

```double MyModel::perturb(DNest4::RNG& rng) {     double logH = 0.0;     DNest4::Gaussian gaussian(30.0, 10.0);     logH += gaussian.perturb(x, rng); // Perturbs x in-place     return logH; } ```
Which means you don’t have to remember a bunch of details about normal distributions. Those are taken care of by the Gaussian class. In principle, instead of making a new Gaussian object twice, I should have just put one in the `MyModel` class and used it in both `from_prior` and `perturb`, but you get the idea.

The collection of available distributions is a bit eclectic (look in the Distributions subdirectory to see them). I’m happy to accept pull requests if anyone wants to implement boring old things like gammas. 🙂 I might get around to it at some point.

### Better automatic detection of the number of levels

One of the annoying things about Nested Sampling is how to decide how much compression is needed, i.e., how many iterations (in classic NS) or how many levels (in Diffusive NS). For a while, DNest4 has tried to guess the appropriate number of levels on the fly if you set the maximum number of levels to 0 in your OPTIONS file. However, the method I used previously was failing on some problems, when the mixing wasn’t that good it wouldn’t create quite enough levels.

This is always a risk, but I think the modified version of the recipe I used for this is better than the old one. I’ll spare you the details.

### More aggressive compilation options

In all the Makefiles, compilation is now done with `-O3 -march=native`, which can slightly improve performance. However, it means any binaries generated may be suboptimal (or even crash) if you try to run them on a different computer from where you compiled them. I don’t think anyone is doing this, but in case you are, you’ll just need to get rid of the `-march=native`.

### An efficiency trick: pre-rejection

This is something I should have been doing ages ago, but some of the examples now demonstrate it. Suppose your model has an expensive part in it, so its perturb function looks like this:
``` double MyModel::perturb(DNest4::RNG& rng) {     double logH = 0.0;     logH += something.perturb(rng); // Perturb some parameters     expensive(); // Do an expensive computation     return logH; } ```
If the prior is implemented such that logH might be low (which might cause a rejection based on the prior), then all the expensive computation is a waste of time. In this case, it’s more efficient to pre-reject the proposal based on the prior only. This has to be done in a way which satisfies overall detailed balance. The trick is to insert a few lines:
``` double MyModel::perturb(DNest4::RNG& rng) {     double logH = 0.0;     logH += something.perturb(rng); // Perturb some parameters```

```     // Pre-rejection trick     if(rng.rand() >= exp(logH)) // Reject?         return -1E300;     else         logH = 0.0;     expensive(); // Do an expensive computation ```

```    return logH; } ```
In some models (e.g., where `something` is an RJObject with a log-uniform prior for the number of components) I have achieved a factor of $\sim$ two speedup by doing this.

Posted in Computing, Inference | 2 Comments

## New video about overfitting

Recently I popped across the road to AUT to give a talk for their statisticians and applied mathematicians. In the Q&A session, I got asked about overfitting, and had the same experience I described in this old post. This motivated me to elaborate on that post, which I did in the form of a video talk.

Here it is – I hope you enjoy it.

There’s one thing I wasn’t completely clear about towards the end of the talk, in the bit with the red and green bars where I discuss trans-dimensional models. The green parts are meant to represent the regions of parameter space that fit the data. The regions that overfit the data will be a tiny subset of the green bars, even in the complex model on the right hand side of the slides. Even if you conditioned on the model all the way on the right, you wouldn’t get overfitting unless you optimised within that model.