Some software I’ve been working on

Hi everyone. I’ve recently been expanding my computational horizons and it has helped me to put some of my trans-dimensional Bayesian inference codes for astronomy into a more usable state for others, with configuration files in YAML, nonzero documentation (though still not thorough), and things like that. I might eventually write up software paper(s) if one becomes popular.

As usual my methods are not the fastest and should be used on smaller datasets where you care about being a purist and getting the most out of it that you can.

Here are a few links:

StarStudded: Probabilistic catalogs for crowded stellar fields

Lensing2: Gravitational lens reconstruction with substructures

Celery: Recover quasi-periodic oscillations from a time series (think asteroseismology)

Any feedback is appreciated!

Posted in Computing, Inference | Leave a comment

DNest4 version 0.2.0

G’day everyone. This post is to update you on some work I’ve been doing (along with some collaborators, notably João Faria) on DNest4. So, you’ll likely only care if you’re a user of that package. There have been enough additions recently that I’ve decided to call it a new version, 0.2.0. Everything is backwards compatible, so don’t stress. But some of the new features might be useful to you.

What are they? Here’s a partial list:

Probability Distribution Classes

There is now a selection of classes available to represent some standard-ish probability distributions. This can make it easier to implement the from_prior and perturb member functions in a model. For example, suppose you want a normal prior centered at 30 with a standard deviation of 10. In the past, you had to implement it like this:


void MyModel::from_prior(DNest4::RNG& rng)
    x = 30.0 + 10.0*rng.randn();

double MyModel::perturb(DNest4::RNG& rng)
    double logH = 0.0;

    logH -= -0.5*pow((x - 30.0)/10.0, 2);
    x += rng.randh();
    logH += -0.5*pow((x - 30.0)/10.0, 2);

    return logH;

Now you can use the class DNest4::Gaussian, an object of which represents a gaussian (normal) distribution. An alternative to the above is then:

#include "DNest4/code/Distributions/Gaussian.h"

void MyModel::from_prior(DNest4::RNG& rng)
    DNest4::Gaussian gaussian(30.0, 10.0);
    x = gaussian.generate(rng);

double MyModel::perturb(DNest4::RNG& rng)
    double logH = 0.0;
    DNest4::Gaussian gaussian(30.0, 10.0);
    logH += gaussian.perturb(x, rng); // Perturbs x in-place
    return logH;

Which means you don’t have to remember a bunch of details about normal distributions. Those are taken care of by the Gaussian class. In principle, instead of making a new Gaussian object twice, I should have just put one in the MyModel class and used it in both from_prior and perturb, but you get the idea.

The collection of available distributions is a bit eclectic (look in the Distributions subdirectory to see them). I’m happy to accept pull requests if anyone wants to implement boring old things like gammas. 🙂 I might get around to it at some point.

Better automatic detection of the number of levels

One of the annoying things about Nested Sampling is how to decide how much compression is needed, i.e., how many iterations (in classic NS) or how many levels (in Diffusive NS). For a while, DNest4 has tried to guess the appropriate number of levels on the fly if you set the maximum number of levels to 0 in your OPTIONS file. However, the method I used previously was failing on some problems, when the mixing wasn’t that good it wouldn’t create quite enough levels.

This is always a risk, but I think the modified version of the recipe I used for this is better than the old one. I’ll spare you the details.

More aggressive compilation options

In all the Makefiles, compilation is now done with -O3 -march=native, which can slightly improve performance. However, it means any binaries generated may be suboptimal (or even crash) if you try to run them on a different computer from where you compiled them. I don’t think anyone is doing this, but in case you are, you’ll just need to get rid of the -march=native.

An efficiency trick: pre-rejection

This is something I should have been doing ages ago, but some of the examples now demonstrate it. Suppose your model has an expensive part in it, so its perturb function looks like this:

double MyModel::perturb(DNest4::RNG& rng)
    double logH = 0.0;
    logH += something.perturb(rng); // Perturb some parameters
    expensive(); // Do an expensive computation
    return logH;

If the prior is implemented such that logH might be low (which might cause a rejection based on the prior), then all the expensive computation is a waste of time. In this case, it’s more efficient to pre-reject the proposal based on the prior only. This has to be done in a way which satisfies overall detailed balance. The trick is to insert a few lines:

double MyModel::perturb(DNest4::RNG& rng)
    double logH = 0.0;
    logH += something.perturb(rng); // Perturb some parameters

    // Pre-rejection trick
    if(rng.rand() >= exp(logH)) // Reject?
        return -1E300;
        logH = 0.0;

    expensive(); // Do an expensive computation

    return logH;

In some models (e.g., where something is an RJObject with a log-uniform prior for the number of components) I have achieved a factor of \sim two speedup by doing this.

Posted in Computing, Inference | 2 Comments

New video about overfitting

Recently I popped across the road to AUT to give a talk for their statisticians and applied mathematicians. In the Q&A session, I got asked about overfitting, and had the same experience I described in this old post. This motivated me to elaborate on that post, which I did in the form of a video talk.

Here it is – I hope you enjoy it.

There’s one thing I wasn’t completely clear about towards the end of the talk, in the bit with the red and green bars where I discuss trans-dimensional models. The green parts are meant to represent the regions of parameter space that fit the data. The regions that overfit the data will be a tiny subset of the green bars, even in the complex model on the right hand side of the slides. Even if you conditioned on the model all the way on the right, you wouldn’t get overfitting unless you optimised within that model.

Posted in Inference | Leave a comment

News and updates

Just a quick post to let you all know about some recent goings-on in my research over the last couple of months.

  • Former MSc student and current PhD student Oliver (Ollie) Stevenson was profiled on our department’s homepage. Ollie is continuing his work on cricket, but we have much more data (from NZ cricket!) and many more questions than we did for his MSc.
  • My paper Computing Entropies with Nested Sampling was published in Entropy and featured as the cover story. Ignore the mistake in Equation 14… 😉
  • Speaking of Nested Sampling, my honours student Syarafana Abdul Rahman is rigorously testing what happens to Nested Sampling runs under imperfect MCMC moves. I might post any interesting conclusions later…
  • We’re almost halfway through Semester 2 here at Auckland, and I’ve been teaching my intro Bayesian course. Inspired by some others who do it really well, I’m posting my lecture recordings online.
  • Earlier this year I developed a custom-built DNest4 model for X-ray diffraction data for geophysicist Michael Rowe, and we’re currently writing that up. By the way, I’m open to this kind of collaboration in general – if you have an inference problem you want me to take a look at, feel free to get in touch and I’ll see if it’s something I can help with. I’m happy to work for co-authorship (for academic collaborators) or payment (for commercial ones).

Have a great day!

Posted in Computing, Inference, Information, Personal | Leave a comment

Measure theory and sports team selection

For several years now I have been reading this paper by Kevin Knuth and John Skilling, about the foundations of measure theory, probability, and MaxEnt. I still only 70% understand it, but the idea is interesting and appeals to my sensibilities. Kevin has a much gentler essay about it, and I’ve also had a go at using it to explain why addition would appear in formal arguments about ethics.

The main result of Knuth and Skilling, which gets recycled throughout their paper, is that addition is the unique (up to regrading – a caveat I shall ignore) way of assigning a value to a thing that is composed of a combination of other things, when there is also an ordering (one thing is bigger/better/has a higher number associated with it [whatever the interpretation], than another).

When something is made up of a combination of x, y, and z, the value associated with the combination is the sum of the values of x, y, and z on their own:

m(x \vee y \vee z) = m(x) + m(y) + m(z).

Here m() is a function that takes some kind of object (whatever x, y, and z are) and returns a real number, and I have assumed x, y, and z are disjoint (don’t worry about it). The main properties you need are associativity (it doesn’t matter in what order you combine things):

x \vee (y \vee z) \equiv (x \vee y) \vee z

and order (if thing x is bigger than thing y, then thing “x combined with z” is bigger than thing “y combined with z“):

m(x) > m(y) \implies m(x \vee z) > m(y \vee z)

m(x) > m(y) \implies m(z \vee x) > m(z \vee y)

So far, all this is very abstract. But applications can be super trivial. For example, if I have a bag with 3 apples, a bag with 10 apples, and a bag with 14 apples, the total number of apples is 3 + 10 + 14 = 27. These numbers are consistent with the underlying ordering (which collections of apples are bigger than others) and the associative symmetry of combining collections of apples. I could even have the same three bags of apples, and apply the sum rule to different concepts about them. For example, the sum rule also applies to the masses of the apples, and (usually) the prices of the apples.

Sports teams

The other day I was wondering whether this had unforeseen applications. I came up with a sports example that works — until it doesn’t. What if I want to select the best possible cricket team, and I have some numbers that describe how good each player is. Maybe something like this (I’m using arbitrary made-up quality numbers here):

Player Quality
Steve Smith 95.6
Brendon Brewer 1.9
Dale Steyn 90.4

We can combine the players to build a team. The combination operator (set union) is associative — combining Smith with Brewer and then adding Steyn gives the same result as combining Brewer and Steyn and then adding Smith. That’s one of the properties needed for the sum rule. The other is order, which seems plausible here. If one player is better than another, the combinations formed from the better player are better than the combinations formed from the worse player.

So we could measure the quality of the Smith-Brewer-Steyn team by doing addition, getting the result 95.6 + 1.9 + 90.4 = 187.9. The theory would work, for example, by showing that taking me out of a team and putting in virtually anyone else who’s even played cricket at high school level will result in a better team.

But the addition theory of team selection breaks down eventually. Consider, say, a team composed of 11 Glenn McGrath clones, each with a quality of 93. The total team quality would be 1023. He was among the best of all time at bowling, but was bad at batting. This team would also score about 80 runs per innings and lose most matches as a result. Why doesn’t the theory work? The associative property still holds. It must be order that breaks down. And it does.

Let x = \{\textnormal{McGrath}_1, ... \textnormal{McGrath}_8\} be a partial cricket team composed of 8 McGrath clones. We could combine x with yet another McGrath clone. Or we could combine it with an okay batsman with a quality of 50. The order property says that since McGrath is higher quality than the okay batsman,

m(\textnormal{McGrath}_i) > m(\textnormal{OkayBatsman})

it should always be better to select yet another McGrath clone than it is to select an okay batsman

m(x \vee \textnormal{McGrath}_i) > m(x \vee \textnormal{OkayBatsman}).

This is false. Associativity but no order implies no sum rule. When you don’t have to worry about the balance of the team, the order property is more true, and so you’d have sum rule behaviour at the beginning of the selection process, but it breaks down as you select more and more players.

Posted in Cricket, Inference, Mathematics | Leave a comment

Introducing ‘Brews’, and two cool things from the internet

A few brief items…

  • I’ve made a little web page for sharing results of analyses I do (mostly these will be posterior samples and marginal likelihood values). I’ll aim to put things up when they’re sufficiently mature and ‘finished’, in the hope that someone might use them for actual science.
  • Check out this fascinating post about an experiment on reddit, where anyone with could contribute to an image by painting one pixel at a time, but had to wait a few minutes between pixel edits. It’s amazing what emerged (via Diana Fleischman on Twitter).
  • A professor at Carnegie Mellon has put a twist on multiple choice exams, by asking students to assign a probability distribution over the possible answers, and then grading them using the logarithmic score. This is sufficiently awesome that I might try it out one day. One way of improving this (and scaring students even more) would be to allow the students to assert a probability distribution that doesn’t factor into an independent distribution for each question (via Daniela Huppenkothen).
Posted in Inference, Personal | 2 Comments

Faculty positions with us!

Our department is going through a period where lots of our long-time faculty members are reaching retirement age, and we’re currently advertising for some new faculty to join us. The ads can be found at this address.

Currently, there are two openings at lecturer level (~ assistant professor in the US system), which is appropriate for someone who’s just getting/got their PhD or has done one or two postdocs [stats is a bit different from physics, in that some people go straight from PhD to lecturer without doing postdocs]. There is also an opening for a senior lecturer or associate professor, which is more senior (everything except the very top, which is full professor).

If you are good, and think New Zealand is good, please apply! Great things about working here:

  • A large department with lots of lovely people;
  • Quite a few applied folks who collaborate with other disciplines;
  • Auckland is a pretty awesome medium-sized (on a log scale) city which is small enough that you can access NZ outdoor activities easily if you like that, yet big enough that famous people come here.

The only downsides are the remote location and the fairly high cost of living (check out Expatistan to compare).


Posted in Uncategorized | Leave a comment