]]>
Keybase sort of feels like an instant messenger client, but it’s automatically end-to-end encrypted and also has encrypted cloud storage, which I find extremely useful. The cloud storage is automatically mounted as a directory in your file system, making it very easy to use. You can also have encrypted private git repositories, create teams to work on them and/or share files with, etc etc.
The main downside for me is just that not many people are on it. My wife, a good friend, and a close work collaborator are, so it’s still very useful. Recently I wanted to email an acquaintance about a very sensitive matter and I’d have liked it to be encrypted. It would have been trivial had he been on Keybase. It’d be great to see more of you on there
LBRY is a blockchain-based file distribution platform and marketplace. You can approximately view it as an uncensorable YouTube alternative (it’s mostly videos, though you can actually use it for any file), with private property rights for the name/URL of a file, and a distributed storage for the file (sort of analogous to BitTorrent, but more decentralised using voodoo that I don’t understand). So you can set a price if you want, buy content that is for sale, or tip your favourite creators.
Because of the decentralised nature of LBRY, some features you might expect (commenting, for example) are harder to implement and not there yet. But they’re working on it.
Brave is a web browser that has several awesome privacy-respecting features. Ads are blocked by default, and eventually you’ll be able to (optionally) earn BAT (basic attention token) for looking at ads which companies have purchased with BAT. There’s also two levels of private browsing: normal private browsing and super-duper private browsing with Tor, which I’ve never used on its own (it seemed too hard). I love being able to use Tor with a simple click.
]]>An intuitive interpretation of this quantity is, loosely speaking, the generalisation of “log-volume” to non-uniform distributions. If you changed parameterisation from x to x’, you’d stretch the axes and end up with different volumes, so is not invariant under changes of coordinates. When using this quantity, a notion of volume is implicitly brought in, based on a flat measure that would not remain flat under arbitrary coordinate changes. In principle, you should give the measure explicitly (or use relative entropy/KL divergence instead), but it’s no big deal if you don’t, as long as you know what you’re doing.
For some reason, in this particular situation only, people wring their hands over the lack of invariance and say this makes the differential entropy wrong or bad or something. For example, the Wikipedia article states
Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continuous probability distributions. Unfortunately, Shannon did not derive this formula, and rather just assumed it was the correct continuous analogue of discrete entropy, but it is not.^{[citation needed]} The actual continuous version of discrete entropy is the limiting density of discrete points (LDDP).
(Aside: this LDDP thing comes from Jaynes, who I think was awesome, but that doesn’t mean that he was always right or that people who’ve read him are always right)
Later on, the Wikipedia article suggests something is strange about differential entropy because it can be negative, whereas discrete Shannon entropy is non-negative. Well, volumes can be less than 1, whereas counts of possibilities cannot be. Scandalous!
This is probably a side effect of the common (and unnecessary) shroud of mystery surrounding information theory. Nobody would be tempted to edit the Wikipedia page on circles to say things like “the area of a circle is not really , because if you do a nonlinear transformation the area will change.” On second thoughts, many maths Wikipedia pages do degenerate into fibre bundles rather quickly, so maybe I shouldn’t say nobody would be tempted.
]]>Running an instance of R inside C++ is fairly easy to do thanks to RInside, but do not expect it to compete with pure C++ for speed, unless your R likelihood function is heavily optimised and dominates the computational cost so that overheads are irrelevant. That’s not the case in the example I implemented yesterday.
This post contains instructions to get everything up and running and to implement models in R. Since I’m not very good at R, some of this is probably more complicated than it needs to be. I’m open to suggestions.
Install DNest4
First, git clone and install DNest4 by following along with my quick start video. Get acquainted with how to run the sampler and what the output looks like.
Look at the R model code
Then, navigate to DNest4/code/Templates/RModel to see the example of a model implemented in R. There’s only one R file in that directory, so open it and take a look. There are three key parts of it. The variable num_params is the integer number of parameters in the model. These are assumed to have Uniform(0, 1) priors, but the function from_uniform is used to apply transformations to make the priors whatever you want them to be. The example is a simple linear regression with two vague normal priors and one vague log-uniform prior. It’s the same example from the paper. Then there’s the log likelihood, which is probably the easiest part to understand. I’m using the traditional iid gaussian prior for the noise around the regression line, with unknown standard deviation.
Fiddly library stuff
Make sure the R packages Rcpp and RInside are installed. In R, do this to install them:
> install.packages("Rcpp") > install.packages("RInside")
Once this is done, find where the header files R.h, Rcpp.h, and RInside.h are on your system, and put those paths on the appropriate lines in DNest4/code/Templates/RModel/Makefile. Then, find the library files libR.so and libRInside.so (the extension is probably different on a Mac) and put their paths in the Makefile as well as adding them to your LD_LIBRARY_PATH environment variable. Enjoy the 1990s computing nostalgia.
Compile
Run make
in order to compile the example, then execute main
in order to run it. Everything should run just the same as in my quick start video, except slower. Don’t try to use multiple threads, and enjoy writing models in R!
Learning basic Haskell has given me another way to think and talk about this issue without getting mired in metaphysics or discussions about subjectivity and objectivity. This post about random sequences is an instance of what I’m talking about in this post.
Haskell is a strongly typed language, like C++. Whenever you define something, you always have to say what type of thing it is. This is pretty much the same idea as in mathematics when you say what set something is in. For example, in the following code x
is an integer and f
is a function that takes two integers as input and returns a string. Each definition is preceded by a type signature which just describes the type. Functions and variables (i.e., constants ;-)) both have types.
x :: Int x = 5 f :: Int -> Int -> String f x y = if x > y then "Hello" else "Goodbye"
Jared wrote his PhD dissertation about how Haskell’s sophisticated type system can be used to represent probability distributions in a very natural way. For example, in your program you might have a type Prob Int
to represent probability distributions over Int
s, Prob Double
to represent probability distributions over Double
s, and so on. The Prob
type has a type parameter, like templates in C++ (e.g., std::vector<T>
defines vectors of all types, and std::vector<int>
and
std::vector<double>
have definite values for the type parameter T
). The Prob
type has instances for Functor
, Applicative
, and Monad
for those who know what that means. It took me about six months to understand this and now that I roughly do, it all seems obvious. But hopefully I can make my point without much of that stuff.
Many instances of the mind projection fallacy turn out to be simple type errors. More specifically, they are attempts to apply functions which require an argument of type Prob a
to input that’s actually of type a
. For example, suppose I define a function which tests whether a distribution over a vector of doubles is gaussian or not:
isGaussian :: Prob (Vector Double) -> Bool
If I tried to apply that to a V.Vector Double
, I’d get a compiler error. There’s no mystery to that, or need for any confusion as to why you can’t do it. It’s like trying to take the logarithm of a string.
Some of the confusion is understandable because our theories have many functions in them, with some functions being similar to each other in name and concept. For example, I might have two similarly-named functions isGaussian1
and isGaussian2
, which take a Prob (Vector Double)
and a Vector Double
as input respectively:
isGaussian1 :: Prob (Vector Double) -> Bool isGaussian1 pdf = if pdf is a product of iid gaussians then True else False isGaussian2 :: Vector Double -> Bool isGaussian2 vec = if a histogram of vec's values looks gaussian then True else False
It would be an instance of the mind projection fallacy if I started mixing these two functions up willy-nilly. For example if I say “my statistical test assumed normality, so I looked at a histogram of the data to make sure that assumption was correct”, I am jumping from isGaussian1
to isGaussian2
mid-sentence, assuming that they are connected (which is often true) and that the connection works in the way I guessed it does (which is often false).
Here’s another example using statistics notions. I could define two standard deviation functions, for the so-called ‘population’ and ‘sample’ standard deviations:
sd1 :: Prob Double -> Double sd2 :: Vector Double -> Double
Statisticians are typically good at keeping these conceptually distinct, since they learn them at the very beginning of their training. On the other hand, here’s an example that physicists tend to get mixed up about:
entropy :: Prob Microstate -> Double disorder :: Microstate -> Double
While there is no standard definition of disorder, no definition can make it equivalent to Gibbs/Shannon entropy, which is a property of the probability distribution, not the state itself. Physicists with their concepts straight should recognise the compiler error right away.
I mentioned above that Prob
has a Functor
instance. That means you can take a function with type a -> b
and apply it to a Prob a
, yielding a Prob b
(this is analogous to how you might apply a function to every element of a vector, yielding a vector output where every element has the new type). The interpretation in probability is that you get the probability distribution for the transformed version of the input variable. So, any time you have a function f :: a -> b
, you also get a function fmap f :: Prob a -> Prob b
. This doesn’t get you a way around the mind projection fallacy, though. For that, the type would need to be Prob a -> b
.
I got one of those working the other day, and was rather amused at some of the output. Check out these residuals of one posterior sample of a gravitational lens fit (bottom right). The “correlated noise” likelihood is quite happy to call this a decent fit!
]]>