## A Rosenbrock challenge for MCMC folks…

Note: After I posted this the first time, I started to distrust my own results. I am confident the properties of the problem are probably something like what my figures show, but perhaps different in a couple of details. I think the posterior here has some long tails with low density, which are only picked up by sampling when you have a very large number of samples. So you should distrust any moments I quoted, at least somewhat (aka my plot of the correlation matrix).

The “Rosenbrock function” is a function of two real numbers which is a bit tough for numerical optimisation methods to find the minimum of. People interested in efficient Monte Carlo methods have also used it to test their methods, usually by defining a “likelihood function” $L(x, y) = \exp\left[-f(x,y)\right]$ where $f(x,y)$ is the Rosenbrock function. If you multiply this by a uniform distribution “prior”, then you have a nice challenging distribution to sample from.

On the Wikipedia page, there are two higher-dimensional generalisations of the Rosenbrock function, which can be used to make high dimensional sampling problems. The first is just a product of independent 2D Rosenbrock problems, so it’s a bit boring, but the second one (which the Wikipedia page calls more involved) is interesting. I found it gets even more interesting if you double the log likelihood (square the likelihood) to sharpen the target density.

My modified 50-dimensional Rosenbrock distribution is now one of the examples in DNest4, and I’ve also put a 10-dimensional version of it in my Haskell implementation of classic Nested Sampling (written with Jared Tobin). The problem is to sample the posterior (and calculate the marginal likelihood?) on a problem with $N$ parameters $\{x_0, ..., x_{N-1}\}$ where the priors are

$x_i \sim \textnormal{Uniform}(-10, 10)$

and the log likelihood is

$\ln(L) = - 2 \sum_{i=0}^{N-2} \left[100(x_{i+1} - x_i^2)^2 + (1 - x_i)^2\right]$.

I’m indexing from zero for consistency with my C++ and Haskell code. One interesting thing is that DNest4 seems to work on this problem when $N=50$, (but needs fairly cautious numerical parameters), whereas classic NS with similar effort peters out, getting stuck in some kind of wrong mode. The “backtracking” in DNest is saving its ass. To be honest, I can’t be sure I’ve got the right answers, as I don’t know ground truth.

Running DNest on this for half an hour (= 1.3 billion likelihood evaluations! What a time to be alive) I got a log evidence estimate of $\ln(Z) = -288.6$ and an information of $H = 262$ nats. Some of the marginal distributions are really interesting. Here’s one with a cool long tail:

It looks even more awesome if you plot the points from the DNest4 target density, rather than just the posterior. Now it’s obvious why this problem is hard – what looks like the most important peak for most of the run ends up not being that important:

Here’s the correlation matrix of all 50 parameters:

It’s interesting that the diagonals get fatter for the later parameters. I wouldn’t have predicted that from inspection of the equation, but the symmetry that seems to be there might be an illusion because there aren’t “periodic boundary conditions”. Perhaps the ‘tail’ part of the distribution becomes more important for the later coordinates.

What do you think? Have I done something wrong, or is the run stuck somewhere misleading? Or is the problem really this freaky? I’d love to hear how other packages perform on this as well, and whether you got the same results.