Introducing ‘Brews’, and two cool things from the internet

A few short items…

  • I’ve made a little web page for sharing results of analyses I do (mostly these will be posterior samples and marginal likelihood values). I’ll aim to put things up when they’re sufficiently mature and ‘finished’, in the hope that someone might use them for actual science.
  • Check out this fascinating post about an experiment on reddit, where anyone with could contribute to an image by painting one pixel at a time, but had to wait a few minutes between pixel edits. It’s amazing what emerged (via Diana Fleischman on Twitter).
  • A professor at Carnegie Mellon has put a twist on multiple choice exams, by asking students to assign a probability distribution over the possible answers, and then grading them using the logarithmic score. This is sufficiently awesome that I might try it out one day. One way of improving this (and scaring students even more) would be to allow the students to assert a probability distribution that doesn’t factor into an independent distribution for each question (via Daniela Huppenkothen).
Posted in Inference, Personal | Leave a comment

Faculty positions with us!

Our department is going through a period where lots of our long-time faculty members are reaching retirement age, and we’re currently advertising for some new faculty to join us. The ads can be found at this address.

Currently, there are two openings at lecturer level (~ assistant professor in the US system), which is appropriate for someone who’s just getting/got their PhD or has done one or two postdocs [stats is a bit different from physics, in that some people go straight from PhD to lecturer without doing postdocs]. There is also an opening for a senior lecturer or associate professor, which is more senior (everything except the very top, which is full professor).

If you are good, and think New Zealand is good, please apply! Great things about working here:

  • A large department with lots of lovely people;
  • Quite a few applied folks who collaborate with other disciplines;
  • Auckland is a pretty awesome medium-sized (on a log scale) city which is small enough that you can access NZ outdoor activities easily if you like that, yet big enough that famous people come here.

The only downsides are the remote location and the fairly high cost of living (check out Expatistan to compare).

auckland

Posted in Uncategorized | Leave a comment

Demand curves are basically CDFs

As my loyal reader knows, I’ve been trying to learn some econ, as I find it quite fascinating and continuous with several other interests. Anyway, last night I was on the phone to Jared, and mentioned a connection I’d noticed between a basic concept in microeconomics and one in statistics. I thought the connection was obvious, but apparently he hadn’t thought of it before, and suggested I blog about it.

Frequency distribution of values

Suppose I’m selling a bike on TradeMe (non-NZ readers: substitute “EBay”). Imagine I could mind-read everyone in New Zealand about their subjective value (the maximum price they would be willing to pay) for the bike. Lots of people don’t want or need the bike (they have better uses for their money), so I’d get lots of low answers. And a few would like it quite a lot, and they’ll have high values. Suppose the result is

\textnormal{(everyone's values)} = \boldsymbol{v} = \{v_1, v_2, ..., v_n\}

where n is the population size. A histogram might look like this:

hist

We could make a continuous approximation to this frequency distribution, using a density function f(x), whose integral is n:

\int_0^{\infty} f(x) \, dx = n

The corresponding cumulative distribution gives the number with value less than x:

\textnormal{Num}(\textnormal{value} < x) = F(x) = \int_0^x f(x') \, dx'

All good. That’s just a CDF, applied to a measure normalised to the population size n, rather than 1 (as in the case of a probability distribution).

Demand Curve

Now, a demand curve is a function D_q(p) which gives the quantity demanded (the number of bikes people would want to buy) as a function of price. For example, if D_q(500) were 20, that means 20 people would want to buy the bike if its price were $500. (I’m assuming no single person would want to buy two or more). D_q(p) has a negative gradient.

If I were to set the price at $100, how many people would want to buy the bike? That’s the same thing as the number of people for whom v is greater than 100:

D_q(100) = \int_{100}^{\infty} f(x) \, dx = n - F(100).

The same argument works for all prices, not just $100:

D_q(p) = \int_{p}^{\infty} f(x) \, dx = n - F(p).

That is, the quantity demanded as a function of price is just the “complementary CDF” of people’s values. Since economists are odd, when they plot D_q(p), they put p on the y-axis, and call the plot a ‘demand curve’. After playing around with this idea a bit, I think I proved that a constant-elasticity demand curve corresponds to a Pareto distribution of values.

I hope this was interesting and/or useful.

Posted in Economics | 2 Comments

Plausibility Theory Podcast, Episode 1: Keagan Brewer

I enjoy podcasts, so had a go at recording one. I hope you enjoy it.

 

 

Posted in Podcast | Leave a comment

A Rosenbrock challenge for MCMC folks…

Note: After I posted this the first time, I started to distrust my own results. I am confident the properties of the problem are probably something like what my figures show, but perhaps different in a couple of details. I think the posterior here has some long tails with low density, which are only picked up by sampling when you have a very large number of samples. So you should distrust any moments I quoted, at least somewhat (aka my plot of the correlation matrix).

The “Rosenbrock function” is a function of two real numbers which is a bit tough for numerical optimisation methods to find the minimum of. People interested in efficient Monte Carlo methods have also used it to test their methods, usually by defining a “likelihood function” L(x, y) = \exp\left[-f(x,y)\right] where f(x,y) is the Rosenbrock function. If you multiply this by a uniform distribution “prior”, then you have a nice challenging distribution to sample from.

On the Wikipedia page, there are two higher-dimensional generalisations of the Rosenbrock function, which can be used to make high dimensional sampling problems. The first is just a product of independent 2D Rosenbrock problems, so it’s a bit boring, but the second one (which the Wikipedia page calls more involved) is interesting. I found it gets even more interesting if you double the log likelihood (square the likelihood) to sharpen the target density.

My modified 50-dimensional Rosenbrock distribution is now one of the examples in DNest4, and I’ve also put a 10-dimensional version of it in my Haskell implementation of classic Nested Sampling (written with Jared Tobin). The problem is to sample the posterior (and calculate the marginal likelihood?) on a problem with N parameters \{x_0, ..., x_{N-1}\} where the priors are

x_i \sim \textnormal{Uniform}(-10, 10)

and the log likelihood is

\ln(L) = - 2 \sum_{i=0}^{N-2} \left[100(x_{i+1} - x_i^2)^2 + (1 - x_i)^2\right].

I’m indexing from zero for consistency with my C++ and Haskell code. One interesting thing is that DNest4 seems to work on this problem when N=50, (but needs fairly cautious numerical parameters), whereas classic NS with similar effort peters out, getting stuck in some kind of wrong mode. The “backtracking” in DNest is saving its ass. To be honest, I can’t be sure I’ve got the right answers, as I don’t know ground truth.

Running DNest on this for half an hour (= 1.3 billion likelihood evaluations! What a time to be alive) I got a log evidence estimate of \ln(Z) = -288.6 and an information of H = 262 nats. Some of the marginal distributions are really interesting. Here’s one with a cool long tail:

4142
It looks even more awesome if you plot the points from the DNest4 target density, rather than just the posterior. Now it’s obvious why this problem is hard – what looks like the most important peak for most of the run ends up not being that important:

4142

Here’s the correlation matrix of all 50 parameters:

corrcoef

It’s interesting that the diagonals get fatter for the later parameters. I wouldn’t have predicted that from inspection of the equation, but the symmetry that seems to be there might be an illusion because there aren’t “periodic boundary conditions”. Perhaps the ‘tail’ part of the distribution becomes more important for the later coordinates.

What do you think? Have I done something wrong, or is the run stuck somewhere misleading? Or is the problem really this freaky? I’d love to hear how other packages perform on this as well, and whether you got the same results.

Posted in Computing, Inference | Leave a comment

A frequentist does his maths homework

Question 1: Solve the quadratic equation x^2 + 4x - 1 = 0.

Answer: The two solutions are given by the quadratic formula

x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

where a=1, b=4, and c=-1. Therefore, after some simplification, the two solutions are x=-2 + \sqrt{5} \approx 0.236 and x=-2 - \sqrt{5} \approx -4.236.

I don’t like much how there is an ambiguity here. It seems to suggest we’d need more information if we actually wanted to know x in a real problem. Sure, if we knew x was positive, we’d take the positive solution, and conversely if it’s negative. But how would we ever know that?

To solve this problem, I propose an alternative methodology. Let’s assume x=-2 + \sqrt{5} \approx 0.236 is actually the solution. From this we can prove that x^2 + 4x - 1 = 0And since zero is greater than -1, we can also say x^2 + 4x - 1 > -1Yep. I propose this as a general procedure for solving quadratic equations. Using the quadratic formula, take the root with the plus sign, not the minus sign. Then use it to derive an inequality that you know for certain is true. This way, you don’t need to
rely on extra assumptions to determine which root is correct, and there is no ambiguity, unlike in the \pm religion.

Posted in Inference | Leave a comment

Gains from trade versus(?) subjective wellbeing

This year I’ve been learning basic economics. It’s a cool subject. One interesting concept is “gains from trade”. The idea is that a person probably only participates in a trade if they think they’d benefit from it. If two parties are both doing this, and agree to trade, they both become better off. For example, suppose I want/need a can of Monster, and would be willing to pay up to $4.60 New Zealand Dollars for one. I go into a shop, who would be willing to sell me the Monster for as low as $3.50, but they set the price at $4.00 to make a profit.
When I buy the drink, I am 60 cents better off (because I got something I reckoned was worth $4.60 to me, but got it for $4.00), and the shop is 50 cents better off also. My 60c + their 50c = $1.10 gains from trade.

Apparently free markets maximise gains from trade. That’s cool, but it left me wondering about situations where that doesn’t seem like the right thing to do. For example, if a doctor in a private clinic could cure either 1) a dying poor person or 2) a billionaire with a broken finger, the billionaire would probably be willing to pay a lot more money, because the money doesn’t matter so much to a billionaire. The gains from trade would be high as the doctor would receive heaps of money. But treating the dying person would presumably create more wellbeing and less suffering in the universe.

So, what should we do? I decided to try to model this using tools I know. So I came up with a statistical mechanics-like model for the situation, and used DNest4 to compute the results. I assumed a situation with 100 doctors available, and 1,000 patients wanting treatment. The patients all varied in the severity of their conditions (i.e. how much wellbeing would improve if they got treatment) and their wealth. Each patient’s willingness to pay was determined by an increasing function of these two factors (wealth and severity of the health problem).

The “parameters” in DNest4 were allocations of doctors to patients; i.e., which 100 patients got treated? The “likelihood” was the gains from trade, so DNest4 found allocations that were much better, in gains-from-trade terms, than what you’d typically get from a lottery (any patient as likely to get treatment as any other). As DNest4 found high gains-from-trade solutions, I also computed the increase in subjective wellbeing. The correlation between the two is shown below (the units of the axes are arbitrary, so don’t read too much into them):

figure_1

 It turns out that allocations with high gains from trade are also those with high improvements in wellbeing, but the correlation isn’t perfect. That’s why emergency departments use traige nurses instead of auctions – because it’s pretty easy to tell who’s got the most severe problem. But an auction wouldn’t be as bad as you might initially guess, and would definitely outperform a lottery.
Posted in Economics, Entropy, Inference | 3 Comments