A Rosenbrock challenge for MCMC folks…

Note: After I posted this the first time, I started to distrust my own results. I am confident the properties of the problem are probably something like what my figures show, but perhaps different in a couple of details. I think the posterior here has some long tails with low density, which are only picked up by sampling when you have a very large number of samples. So you should distrust any moments I quoted, at least somewhat (aka my plot of the correlation matrix).

The “Rosenbrock function” is a function of two real numbers which is a bit tough for numerical optimisation methods to find the minimum of. People interested in efficient Monte Carlo methods have also used it to test their methods, usually by defining a “likelihood function” L(x, y) = \exp\left[-f(x,y)\right] where f(x,y) is the Rosenbrock function. If you multiply this by a uniform distribution “prior”, then you have a nice challenging distribution to sample from.

On the Wikipedia page, there are two higher-dimensional generalisations of the Rosenbrock function, which can be used to make high dimensional sampling problems. The first is just a product of independent 2D Rosenbrock problems, so it’s a bit boring, but the second one (which the Wikipedia page calls more involved) is interesting. I found it gets even more interesting if you double the log likelihood (square the likelihood) to sharpen the target density.

My modified 50-dimensional Rosenbrock distribution is now one of the examples in DNest4, and I’ve also put a 10-dimensional version of it in my Haskell implementation of classic Nested Sampling (written with Jared Tobin). The problem is to sample the posterior (and calculate the marginal likelihood?) on a problem with N parameters \{x_0, ..., x_{N-1}\} where the priors are

x_i \sim \textnormal{Uniform}(-10, 10)

and the log likelihood is

\ln(L) = - 2 \sum_{i=0}^{N-2} \left[100(x_{i+1} - x_i^2)^2 + (1 - x_i)^2\right].

I’m indexing from zero for consistency with my C++ and Haskell code. One interesting thing is that DNest4 seems to work on this problem when N=50, (but needs fairly cautious numerical parameters), whereas classic NS with similar effort peters out, getting stuck in some kind of wrong mode. The “backtracking” in DNest is saving its ass. To be honest, I can’t be sure I’ve got the right answers, as I don’t know ground truth.

Running DNest on this for half an hour (= 1.3 billion likelihood evaluations! What a time to be alive) I got a log evidence estimate of \ln(Z) = -288.6 and an information of H = 262 nats. Some of the marginal distributions are really interesting. Here’s one with a cool long tail:

4142
It looks even more awesome if you plot the points from the DNest4 target density, rather than just the posterior. Now it’s obvious why this problem is hard – what looks like the most important peak for most of the run ends up not being that important:

4142

Here’s the correlation matrix of all 50 parameters:

corrcoef

It’s interesting that the diagonals get fatter for the later parameters. I wouldn’t have predicted that from inspection of the equation, but the symmetry that seems to be there might be an illusion because there aren’t “periodic boundary conditions”. Perhaps the ‘tail’ part of the distribution becomes more important for the later coordinates.

What do you think? Have I done something wrong, or is the run stuck somewhere misleading? Or is the problem really this freaky? I’d love to hear how other packages perform on this as well, and whether you got the same results.

Advertisements
Posted in Computing, Inference | Leave a comment

A frequentist does his maths homework

Question 1: Solve the quadratic equation x^2 + 4x - 1 = 0.

Answer: The two solutions are given by the quadratic formula

x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

where a=1, b=4, and c=-1. Therefore, after some simplification, the two solutions are x=-2 + \sqrt{5} \approx 0.236 and x=-2 - \sqrt{5} \approx -4.236.

I don’t like much how there is an ambiguity here. It seems to suggest we’d need more information if we actually wanted to know x in a real problem. Sure, if we knew x was positive, we’d take the positive solution, and conversely if it’s negative. But how would we ever know that?

To solve this problem, I propose an alternative methodology. Let’s assume x=-2 + \sqrt{5} \approx 0.236 is actually the solution. From this we can prove that x^2 + 4x - 1 = 0And since zero is greater than -1, we can also say x^2 + 4x - 1 > -1Yep. I propose this as a general procedure for solving quadratic equations. Using the quadratic formula, take the root with the plus sign, not the minus sign. Then use it to derive an inequality that you know for certain is true. This way, you don’t need to
rely on extra assumptions to determine which root is correct, and there is no ambiguity, unlike in the \pm religion.

Posted in Inference | Leave a comment

Gains from trade versus(?) subjective wellbeing

This year I’ve been learning basic economics. It’s a cool subject. One interesting concept is “gains from trade”. The idea is that a person probably only participates in a trade if they think they’d benefit from it. If two parties are both doing this, and agree to trade, they both become better off. For example, suppose I want/need a can of Monster, and would be willing to pay up to $4.60 New Zealand Dollars for one. I go into a shop, who would be willing to sell me the Monster for as low as $3.50, but they set the price at $4.00 to make a profit.
When I buy the drink, I am 60 cents better off (because I got something I reckoned was worth $4.60 to me, but got it for $4.00), and the shop is 50 cents better off also. My 60c + their 50c = $1.10 gains from trade.

Apparently free markets maximise gains from trade. That’s cool, but it left me wondering about situations where that doesn’t seem like the right thing to do. For example, if a doctor in a private clinic could cure either 1) a dying poor person or 2) a billionaire with a broken finger, the billionaire would probably be willing to pay a lot more money, because the money doesn’t matter so much to a billionaire. The gains from trade would be high as the doctor would receive heaps of money. But treating the dying person would presumably create more wellbeing and less suffering in the universe.

So, what should we do? I decided to try to model this using tools I know. So I came up with a statistical mechanics-like model for the situation, and used DNest4 to compute the results. I assumed a situation with 100 doctors available, and 1,000 patients wanting treatment. The patients all varied in the severity of their conditions (i.e. how much wellbeing would improve if they got treatment) and their wealth. Each patient’s willingness to pay was determined by an increasing function of these two factors (wealth and severity of the health problem).

The “parameters” in DNest4 were allocations of doctors to patients; i.e., which 100 patients got treated? The “likelihood” was the gains from trade, so DNest4 found allocations that were much better, in gains-from-trade terms, than what you’d typically get from a lottery (any patient as likely to get treatment as any other). As DNest4 found high gains-from-trade solutions, I also computed the increase in subjective wellbeing. The correlation between the two is shown below (the units of the axes are arbitrary, so don’t read too much into them):

figure_1

 It turns out that allocations with high gains from trade are also those with high improvements in wellbeing, but the correlation isn’t perfect. That’s why emergency departments use traige nurses instead of auctions – because it’s pretty easy to tell who’s got the most severe problem. But an auction wouldn’t be as bad as you might initially guess, and would definitely outperform a lottery.
Posted in Economics, Entropy, Inference | 3 Comments

Markets and Auckland Housing

Like many similar cities around the world, Auckland housing has become much more expensive very rapidly over the last few years. There are many reasons for this, and it is having various effects, positive for some (e.g., those whose houses have increased in value) and negative for others (e.g., those who have never bought a house and would like to, but cannot at current prices; and further down the income scale and more worryingly, those who cannot afford rents). There have been heaps of articles on this from all sorts of different angles, such as this one this morning. I agree with some bits of the article, but this sentence got my goat:

“Only the most laissez faire of free-marketeers still believe that this housing market is working as it should, as increasing numbers of families are forced to sleep in their cars and young Kiwis give up any hope of owning their own home.”

As my loyal reader knows, I have been trying to learn basic economics this year, as a hobby project, like how I learned Spanish starting a couple of years ago. I have also become a lot more open to classical liberal ideas than I used to be. However, it is a myth that all economists are laissez faire non-interventionists. Economics is a properly heterodox field, and Milton Friedman lovers are only a subset of the profession (and are probably right on many things and wrong on others). Anyway, I thought the sentence was a lazy throwaway line to appeal to some popular falsehoods that “everyone knows”: i) only greedy rich people support free markets, and they don’t care about poor people; ii) the current system is a free market; iii) therefore it is inherently perfect (as opposed to probably better than many alternatives you might try to come up with).

So I started thinking about what market forces are involved in the housing crisis and how prices have mitigated the effects of it, even if they haven’t “solved” it.  First though, some familiar reasons why prices have gone up:

i) Immigration to Auckland is high, increasing demand (= more people are willing and able to pay more for the same thing than used to be the case).

ii) People who have money to invest see housing as an option for this, and substitutes have becomes less attractive (rates of return on bonds and cash are low, shares have also become expensive over the same time period as houses).

iii) People seeing the fast price increases are incentivised to buy houses in expectation of becoming wealthier in the future as a result (i.e. the ‘bubble’ explanation).

iv) Auckland is becoming a better city to live in and the NZ economy is doing quite well. If a city becomes a more desirable place to live, that increases demand for housing there (this is related to (i)).

Now, here are some market forces that have mitigated the crisis (made it less bad than it otherwise might have been). Here are a few off the top of my head.

i) High prices signal that Auckland houses are scarce, and encourage people to economise. For example, my wife and I bought a 2 bedroom 80 square metre unit instead of a 3 bedroom 150 square metre house (which we would have bought if we’d had a lot more money). Some good friends of ours were renting a unit in Devonport but their landlord had to move back in. They looked for their next rental house in New Lynn due to high prices in Devonport. This frees up the scarce and highly-valued resource of Devonport housing.

Overcrowding is another example of economisation of scarce resources, but with more tragic human consequences.

ii) High prices act as an incentive for developers to build more houses as they can make a profit from this activity as houses can be sold for a lot of money.

iii) High prices act as an incentive to get people who have houses beyond their current needs to sell. For example, a retired couple downsizing and funding their retirement with the proceeds.

IMHO, one of the major problems is that effect (ii) has been hamstrung by how difficult and restrictive the rules are around building more houses. The council’s “Unitary Plan” should improve this situation somewhat. It will do this by making the rules around building a little bit more free-markety, not less. Basically, the fact there are problems with housing does not mean that markets aren’t doing useful things. They could be doing more useful things and they will once we let them.

Posted in Economics | Leave a comment

A JAGS-like interface to DNest4

As you probably know, I use Diffusive Nested Sampling (the latest implementation of which is DNest4) to do almost all of my data analysis. It works on pretty much all problems I throw at it, as long as the likelihood function is fairly fast to evaluate. One barrier to increasing its popularity has been that you needed to know C++ to implement models (priors and likelihoods, i.e., models of prior information). You also had to implement Metropolis proposals yourself, which can be a fair bit of work. Things have improved lately on the language front, as you can implement models in Python and Julia. You will lose out on computational efficiency, though. C++ is still the best way to implement a model, if you can do it.

In C++, proposals for hierarchical models can get a bit annoying, and the RJObject class helps a bit with that. But still, hardly any of my users are going to be good at C++ and want to use a template class to run a routine data analysis. A simpler solution is needed.

Since I learned about them, I’ve admired the model specification languages used by JAGS, OpenBUGS, and Stan, and have wanted the option to specify a model in something simple like these, which then gets translated into C++ code with sensible default proposals, so you can run DNest4.

Now it exists! It’s experimental, and not covered in the DNest4 paper, but is basically working. As an example, here’s a simple data analysis demo. Some JAGS code for a simple linear regression with naive uniform priors is:

model
{
    # Slope and intercept
    m ~ dunif(-100, 100)
    b ~ dunif(-100, 100)

    # Noise standard deviation
    log_sigma ~ dunif(-10, 10)
    sigma <- exp(log_sigma)

    # p(data | parameters)
    for(i in 1:N)
    {
        y[i] ~ dnorm(m*x[i] + b, 1/sigma^2)
    }
}

And an R list (approximately the same as a Python dictionary) containing some data is

data = list(x=c(1, 2, 3, 4, 5), y=c(1, 2, 3, 3.9, 5.1), N=5)

Now, how would I implement the same thing in DNest4? I could write a C++ class, very much like the one given in the paper (and for a research problem, I would). But an alternative way is to use DNest4’s experimental ‘model builder’, which is the JAGS-like interface (but in Python) I’ve been wanting. Here is equivalent model code for the linear regression. Writing this feels kind of like a cross between using JAGS and using daft, the DAG-plotter by Foreman-Mackey and Hogg.

First, the data needs to be in a Python dictionary (containing floats, ints, and numpy arrays of floats & ints):

 data = {"x": np.array([1.0, 2.0, 3.0, 4.0, 5.0]),\
         "y": np.array([1.0, 2.0, 3.0, 3.9, 5.1]),\
         "N": 5}

Then, construct a ‘Model’:

# Create the model
model = bd.Model()

# Slope and intercept
model.add_node(bd.Node("m", bd.Uniform(-100.0, 100.0)))
model.add_node(bd.Node("b", bd.Uniform(-100.0, 100.0)))

# Noise standard deviation
model.add_node(bd.Node("log_sigma", bd.Uniform(-10.0, 10.0)))
model.add_node(bd.Node("sigma", bd.Delta("exp(log_sigma)")))

# p(data | parameters)
for i in range(0, data["N"]):
    name = "y{index}".format(index=i)
    mean = "m*x{index} + b".format(index=i)
    model.add_node(bd.Node(name, bd.Normal(mean, "sigma"),\
                                 observed=True))

You use the add_node() function to add nodes (parameter or data quantities) to the model. For the Node constructor, the first argument is the name, and the second is the prior distribution.

The loop part makes nodes for the data, called y0, y1, y2, and so on, by using Python strings. To signal that a node is observed (so the C++ generated needs to put a term in the log likelihood rather than in the prior-related parts of the code), pass observed=True.

To generate and compile the C++ code, the Python script needs to be in the code/Templates/Builder directory, and then the following Python will generate the C++:

# Create the C++ code
bd.generate_h(model, data)
bd.generate_cpp(model, data)

# Compile the C++ code so it's ready to go
import os
os.system("make")

Then you can run the executable just as you could if you had implemented the model by writing a C++ class. Sweet!

The code/Templates/Builder directory includes this example (in for_blog.py) and a few more, including the old BUGS “Rats” example. Feel free to ask me for more details about this feature if you would like help using it!

Posted in Inference, Personal | Leave a comment

An article about experts, and joining Heterodox Academy

Any followers of this blog who aren’t on Facebook or Twitter might be interested in this article I wrote about expertise and its limits.

In related news, I have signed up as a supporter of Heterodox Academy, a group advocating viewpoint diversity in academia as a bulwark against groupthink both within specific research areas and in the greater academic community. If you know me, you probably know that I used to be quite left-wing and a supporter of all the popular causes that academics tend to be enthusiastic about. Over the last 18 months I have become disillusioned and frustrated with much of this. If someone says something popular but false (or at least debateable), it shouldn’t require bravery to openly question it. But it does, and that’s the opposite of a good strategy for pursuing truth.

Posted in Personal | 6 Comments

A Co-Blogger!

Luke Barnes reminds readers of his blog Letters to Nature that it’s not just his blog. 🙂

Posted in Uncategorized | Leave a comment