Last night, I decided to bite the bullet and add yet another method to implement models in DNest4, this time using R. Statisticians know R, so it’s probably a good idea to support their language in some form. This brings the list of ways of using DNest4 to the following:
Running an instance of R inside C++ is fairly easy to do thanks to RInside, but do not expect it to compete with pure C++ for speed, unless your R likelihood function is heavily optimised and dominates the computational cost so that overheads are irrelevant. That’s not the case in the example I implemented yesterday.
This post contains instructions to get everything up and running and to implement models in R. Since I’m not very good at R, some of this is probably more complicated than it needs to be. I’m open to suggestions.
First, git clone and install DNest4 by following along with my quick start video. Get acquainted with how to run the sampler and what the output looks like.
Look at the R model code
Then, navigate to DNest4/code/Templates/RModel to see the example of a model implemented in R. There’s only one R file in that directory, so open it and take a look. There are three key parts of it. The variable num_params is the integer number of parameters in the model. These are assumed to have Uniform(0, 1) priors, but the function from_uniform is used to apply transformations to make the priors whatever you want them to be. The example is a simple linear regression with two vague normal priors and one vague log-uniform prior. It’s the same example from the paper. Then there’s the log likelihood, which is probably the easiest part to understand. I’m using the traditional iid gaussian prior for the noise around the regression line, with unknown standard deviation.
Fiddly library stuff
Make sure the R packages Rcpp and RInside are installed. In R, do this to install them:
> install.packages("Rcpp") > install.packages("RInside")
Once this is done, find where the header files R.h, Rcpp.h, and RInside.h are on your system, and put those paths on the appropriate lines in DNest4/code/Templates/RModel/Makefile. Then, find the library files libR.so and libRInside.so (the extension is probably different on a Mac) and put their paths in the Makefile as well as adding them to your LD_LIBRARY_PATH environment variable. Enjoy the 1990s computing nostalgia.
make in order to compile the example, then execute
main in order to run it. Everything should run just the same as in my quick start video, except slower. Don’t try to use multiple threads, and enjoy writing models in R!
How good is this?
I enjoyed this critique on the Lunduke Show, which I found via LBRY.
Ed Jaynes frequently criticised what he called the mind projection fallacy. This is the implicit assumption that our concepts of, say, gaussianity, randomness, and so on, reflect properties of to be found “out there in nature”, whereas they often reside only in the structure of our thinking. I’ve brought this up a few times on the blog, in old posts such as Stochastic and Testing Model Assumptions.
Learning basic Haskell has given me another way to think and talk about this issue without getting mired in metaphysics or discussions about subjectivity and objectivity. This post about random sequences is an instance of what I’m talking about in this post.
Haskell is a strongly typed language, like C++. Whenever you define something, you always have to say what type of thing it is. This is pretty much the same idea as in mathematics when you say what set something is in. For example, in the following code
x is an integer and
f is a function that takes two integers as input and returns a string. Each definition is preceded by a type signature which just describes the type. Functions and variables (i.e., constants ;-)) both have types.
x :: Int x = 5 f :: Int -> Int -> String f x y = if x > y then "Hello" else "Goodbye"
Jared wrote his PhD dissertation about how Haskell’s sophisticated type system can be used to represent probability distributions in a very natural way. For example, in your program you might have a type
Prob Int to represent probability distributions over
Prob Double to represent probability distributions over
Doubles, and so on. The
Prob type has a type parameter, like templates in C++ (e.g.,
std::vector<T> defines vectors of all types, and
std::vector<double> have definite values for the type parameter
Prob type has instances for
Monad for those who know what that means. It took me about six months to understand this and now that I roughly do, it all seems obvious. But hopefully I can make my point without much of that stuff.
Many instances of the mind projection fallacy turn out to be simple type errors. More specifically, they are attempts to apply functions which require an argument of type
Prob a to input that’s actually of type
a. For example, suppose I define a function which tests whether a distribution over a vector of doubles is gaussian or not:
isGaussian :: Prob (Vector Double) -> Bool
If I tried to apply that to a
V.Vector Double, I’d get a compiler error. There’s no mystery to that, or need for any confusion as to why you can’t do it. It’s like trying to take the logarithm of a string.
Some of the confusion is understandable because our theories have many functions in them, with some functions being similar to each other in name and concept. For example, I might have two similarly-named functions
isGaussian2, which take a
Prob (Vector Double) and a
Vector Double as input respectively:
isGaussian1 :: Prob (Vector Double) -> Bool isGaussian1 pdf = if pdf is a product of iid gaussians then True else False isGaussian2 :: Vector Double -> Bool isGaussian2 vec = if a histogram of vec's values looks gaussian then True else False
It would be an instance of the mind projection fallacy if I started mixing these two functions up willy-nilly. For example if I say “my statistical test assumed normality, so I looked at a histogram of the data to make sure that assumption was correct”, I am jumping from
isGaussian2 mid-sentence, assuming that they are connected (which is often true) and that the connection works in the way I guessed it does (which is often false).
Here’s another example using statistics notions. I could define two standard deviation functions, for the so-called ‘population’ and ‘sample’ standard deviations:
sd1 :: Prob Double -> Double sd2 :: Vector Double -> Double
Statisticians are typically good at keeping these conceptually distinct, since they learn them at the very beginning of their training. On the other hand, here’s an example that physicists tend to get mixed up about:
entropy :: Prob Microstate -> Double disorder :: Microstate -> Double
While there is no standard definition of disorder, no definition can make it equivalent to Gibbs/Shannon entropy, which is a property of the probability distribution, not the state itself. Physicists with their concepts straight should recognise the compiler error right away.
I mentioned above that
Prob has a
Functor instance. That means you can take a function with type
a -> b and apply it to a
Prob a, yielding a
Prob b (this is analogous to how you might apply a function to every element of a vector, yielding a vector output where every element has the new type). The interpretation in probability is that you get the probability distribution for the transformed version of the input variable. So, any time you have a function
f :: a -> b, you also get a function
fmap f :: Prob a -> Prob b. This doesn’t get you a way around the mind projection fallacy, though. For that, the type would need to be
Prob a -> b.
A lot of my research over the years has involved fitting images. Usually, I use the traditional assumption that the conditional prior for the data pixels given the model pixels is iid normal with mean zero. But in some cases one might want to use a correlated distribution instead. That’s a technical challenge which I’ve been trying to solve from several angles (though good solutions might be well known to people who know these things).
I got one of those working the other day, and was rather amused at some of the output. Check out these residuals of one posterior sample of a gravitational lens fit (bottom right). The “correlated noise” likelihood is quite happy to call this a decent fit! 🙂
A relative of mine recently remarked that consumer products involving less waste (that is, less plastic packaging and so on) usually cost more. The same is true when you compare electric cars to petrol cars, and I’m sure you can think of many other examples. But are we missing something here?
Suppose I can choose between brand A and brand B for my purchases of some product, and both meet my purposes equally well. Suppose also that A comes with lots of plastic packaging and costs $10, and B does not, yet costs $13. If I don’t care about the plastic (both in its convienient or annoying effects on me, or its real or imagined environmental impact once discarded), it would be a bad decision to buy product B, at least on my side of the transaction. I could use the $3 saved by buying A for something else I need, or even donate it to the Against Malaria Foundation.
On the other hand, if we think that the harm caused by discarding this particular unit of plastic packaging is worth the sacrifice of the $3, then purchasing B would be the right decision. In order to really decide, we would somehow have to know the extent of this impact and to what extent someone does value it, or ought to value it (that is, weigh it against the benefits of the $3). This is the kind of situation that Economics 101 usually presents with its solution of a Pigouvian tax to offset the external cost, but in my story I’ve put that decision in the hands of the purchaser (Economics 101 doesn’t usually mention the fact that the people setting the level of the tax might not have every incentive to do so accurately, if it is even possible to define that).
However, here’s a tempting line of reasoning that I think is wrong. We should buy product B because it’s less wasteful, in that it takes fewer resources to produce. Just look at the wasteful packaging A has and B doesn’t have! But this neglects the fact that the price is conveying information about things we could rarely learn in all their complexity, such as how valuable the alternative uses are of the resources that went into producing A and B. If B is more expensive, then it is very plausible that it takes more resources to produce it. Perhaps it takes some scarce specialised ingredient, or highly skilled labour (which also has alternative uses), etc. I am reminded of a time a friend lamented the tremendous resources going into producing plastic cutlery, which is so resource intensive that you can buy huge packets of it for a few dollars. If plastic cutlery were truly ridiculously wasteful, you’d be having trouble budgeting for it and find the need to cut back. But most people don’t have to worry that much about their plastic cutlery expenditure.
In the long run it would be exciting if electric cars became cheaper than petrol ones, but for now they aren’t, which means they almost certainly are taking more resources to produce. The external costs here might justify the difference, and many people make that case (more catastrophic views of climate change would imply this, while it’s-one-issue-among-many views would not). But the case isn’t made by pointing at the lack of a tailpipe and forgetting to weigh that against the thousands of dollars at the other end of the transaction. Similarly, if a kind of paper costs twice as much, a business it putting itself at slightly higher risk of going bankrupt by purchasing that kind of paper. Ironically, pursuing environmental sustainability at any price (literally) can mean both wasting more resources and making activities less sustainable.