As you probably know, I use Diffusive Nested Sampling (the latest implementation of which is DNest4) to do almost all of my data analysis. It works on pretty much all problems I throw at it, as long as the likelihood function is fairly fast to evaluate. One barrier to increasing its popularity has been that you needed to know C++ to implement models (priors and likelihoods, i.e., models *of prior information*). You also had to implement Metropolis proposals yourself, which can be a fair bit of work. Things have improved lately on the language front, as you can implement models in Python and Julia. You will lose out on computational efficiency, though. C++ is still the best way to implement a model, if you can do it.

In C++, proposals for hierarchical models can get a bit annoying, and the RJObject class helps a bit with that. But still, hardly any of my users are going to be good at C++ and want to use a template class to run a routine data analysis. A simpler solution is needed.

Since I learned about them, I’ve admired the model specification languages used by JAGS, OpenBUGS, and Stan, and have wanted the option to specify a model in something simple like these, which then gets translated into C++ code with sensible default proposals, so you can run DNest4.

Now it exists! It’s experimental, and not covered in the DNest4 paper, but is basically working. As an example, here’s a simple data analysis demo. Some JAGS code for a simple linear regression with naive uniform priors is:

model { # Slope and intercept m ~ dunif(-100, 100) b ~ dunif(-100, 100) # Noise standard deviation log_sigma ~ dunif(-10, 10) sigma <- exp(log_sigma) # p(data | parameters) for(i in 1:N) { y[i] ~ dnorm(m*x[i] + b, 1/sigma^2) } }

And an R list (approximately the same as a Python dictionary) containing some data is

data = list(x=c(1, 2, 3, 4, 5), y=c(1, 2, 3, 3.9, 5.1), N=5)

Now, how would I implement the same thing in DNest4? I could write a C++ class, very much like the one given in the paper (and for a research problem, I would). But an alternative way is to use DNest4’s experimental ‘model builder’, which is the JAGS-like interface (but in Python) I’ve been wanting. Here is equivalent model code for the linear regression. Writing this feels kind of like a cross between using JAGS and using daft, the DAG-plotter by Foreman-Mackey and Hogg.

First, the data needs to be in a Python dictionary (containing floats, ints, and numpy arrays of floats & ints):

data = {"x": np.array([1.0, 2.0, 3.0, 4.0, 5.0]),\ "y": np.array([1.0, 2.0, 3.0, 3.9, 5.1]),\ "N": 5}

Then, construct a ‘Model’:

# Create the model model = bd.Model() # Slope and intercept model.add_node(bd.Node("m", bd.Uniform(-100.0, 100.0))) model.add_node(bd.Node("b", bd.Uniform(-100.0, 100.0))) # Noise standard deviation model.add_node(bd.Node("log_sigma", bd.Uniform(-10.0, 10.0))) model.add_node(bd.Node("sigma", bd.Delta("exp(log_sigma)"))) # p(data | parameters) for i in range(0, data["N"]): name = "y{index}".format(index=i) mean = "m*x{index} + b".format(index=i) model.add_node(bd.Node(name, bd.Normal(mean, "sigma"),\ observed=True))

You use the add_node() function to add nodes (parameter or data quantities) to the model. For the Node constructor, the first argument is the name, and the second is the prior distribution.

The loop part makes nodes for the data, called y0, y1, y2, and so on, by using Python strings. To signal that a node is observed (so the C++ generated needs to put a term in the log likelihood rather than in the prior-related parts of the code), pass observed=True.

To generate and compile the C++ code, the Python script needs to be in the code/Templates/Builder directory, and then the following Python will generate the C++:

# Create the C++ code bd.generate_h(model, data) bd.generate_cpp(model, data) # Compile the C++ code so it's ready to go import os os.system("make")

Then you can run the executable just as you could if you had implemented the model by writing a C++ class. Sweet!

The code/Templates/Builder directory includes this example (in for_blog.py) and a few more, including the old BUGS “Rats” example. Feel free to ask me for more details about this feature if you would like help using it!

Pingback: R-based model specification for DNest4 | Plausibility Theory