Whither Bayes


As I hoped, Marcus Ranum responded to my prior blog post. Most of it is specific to the DNC hack, but there are a few general arguments against Bayesian statistics. If those carry any weight, they collapse my two blog posts by knocking out a core premise. I should deal with the generalities first before more specific critiques.

Depending on past probabilities is intellectual laziness, dishonesty, or ignorance, especially if it’s regarding a matter of any importance.

That’s a fairly standard objection to Bayesian arguments, which has doubtless been offered over and over to little avail. To me, it seems obvious: yes, we build our epistemology of the present on what has happened in the past, but that does not excuse us from exploring and measuring the present.

There’s a fairly strong counter-example to this. Jean Dixon predicted the world would end in 2020. If we only look at the present, we have no way to refute her prediction and should prepare for the end of the world. If instead we include the long history of failed apocalyptic visions, the repeated failures of psychics to predict the future, and the improbability of Jesus Christ’s existence (let alone divinity), then we have plenty of reason to laugh her off and carry on with our lives.

Or consider a classic example courtesy of Jacob Cohen. We possess a test for schizophrenia which has a false positive rate of less than 5% and a false negative rate of less than 3%. Handed no more information than this, it’d seem fair to conclude that someone with a positive test result has a 95% chance of being schizophrenic. Add in the fact that schizophrenia only occurs in 2% of the population, however, and the chance is actually 40%!

It’s not lazy nor ignorant to include past probabilities, it’s vital if you want to make solid predictions.

This is a nonsense posture of faux Cartesian ignorance. Of course you are not pretending you have no information about the attack! You encoded a great deal of knowledge about the attack in the initial set of options!

We invoke Cartesian ignorance all the time, without issue. The previous example completely ignores the evidence of the test when it invokes the base rate of schizophrenia. The only specifics I gave you about Dixon’s apocalyptic vision is that it happens in 2020, yet we had no problems sketching out prior probabilities. Doctors and medical researchers have no idea how prone you are to disease, but that doesn’t stop them from recommending periodic check-ups for you based on population statistics.

If you’ve brushed past a detective story at some point in your life, you’ll recall that their first step after examining the crime scene is to look for people who could have plausibly done it. They do this without knowing whether or not the murder weapon is buried in someone’s back yard, they recently purchased rat poison online, and so on. Those detectives don’t consider everyone a suspect, however, even though it’s technically possible that anyone could have done the crime. They only have finite resources to crack the case (and retirement is coming super-quick!), so they start off with a biased set of hypotheses. That’s OK, provided they don’t mind adding more later and don’t mind being wrong about the suspects they currently have.

What Hornbeck is saying is, “Your data should be lightly fried, but not cooked completely. It should not be blackened on both sides, because data that is cooked too thoroughly gets rubbery and hard to chew.” Joking aside, I cannot read that as anything other than an adminition to make sure that your presuppositions are projected through your priors so that you weight the evidence to support the conclusion you already reached.

Nope. Let’s approach this from another angle.

Let’s say you’re handed a coin, and want to see whether or not it’s biased. We know that biased coins are rare, so we set the priors of being biased low and the priors of being fair high. We settle in to test this bias by flipping the coin repeatedly.

Will we ever come to a conclusion, one way or the other?

Under Bayesian inference, the answer is “yes.” A toss which lowers the likelihood of one hypothesis will raise the likelihood of the other, and as the pile of evidence mounts you’ll inevitably be tugged in one direction. There’s plenty of fine print, of course: the hypotheses have to be mutually exclusive, there has to be an unlimited pile of evidence to draw from, and the throws have to be unbiased. Even if those are violated you can still wind up favouring some hypotheses over another, it just isn’t guaranteed.

Now, remember that our priors favoured the coin being fair. If the coin is not fair, we’ve got to toss it multiple times to remove that bias. If our priors had instead put the “biased” and “fair” hypotheses on equal footing, we wouldn’t have to do those tosses. Given enough tosses, though, both scenarios wind up agreeing that the coin is biased. The moral of the story: biased priors can be overcome by sufficient evidence.

Now, reread what I originally said:

“If you have a lot of evidence, or all your priors are roughly equal, then the evidence will overwhelm your priors. The less equal they are, or the less evidence you have, the more important it is to get the prior likelihoods right.”

That’s a banal statement of how Bayesian inference works. If it says anything about your priors, it argues you should inflate the priors on unlikely hypotheses, not lower them to overwhelm the evidence.

Finally, plug your assumptions into Bayes Theorem and you will learn which of your assumptions carries the most weight in your model. You gain no actual knowledge about objective reality, but you’ll achieve some clarity regarding your presuppositions and biases. Unfortunately, since you’re biased, you will have left unconsidered those things you did not think to consider, even if those things are vastly more likely than any of your assumptions.

Back in World War II, the Allies were trying to figure out how many tanks the Germans could produce. Their spies were telling them one number, while a statistical analysis of tank serial numbers told them another. Post-war, they had access to German records and could check which method did better.

Month Statistical estimate Intelligence estimate German records
June 1940 169 1,000 122
June 1941 244 1,550 271
August 1942 327 1,550 342

Alan Turing reinvented Bayesian analysis to crack Axis codes. About a decade later, insurance agencies reinvented Bayesian analysis to assign insurance premiums. Bayesian methods are routinely used during search and rescue; investigators throw out a number if hypotheses about where the missing plane/boat could be, search the more probable areas, then update the likelihoods and head back out to sea.

Nowadays, Bayesian methods are used for everything from spam filtering to astronomy. “emcee” is a software package for Bayesian analysis, written in Python. Despite being a mere four years old, it’s been used in over a thousand scientific papers.

Over two centuries ago, Thomas Bayes presented an example of Bayesian statistics capturing reality better than other statistical methods. The framework he helped create has carried on that tradition, proving itself an accurate predictor of reality in many branches of science.

That’s a huge problem for Bayesian analysis: you’ve got a bit of evidence stuck in your priors that one person appears willing to accept, while another is not. Does that mean you just go “too bad” and crunch your priors – garbage in, garbage out – or do you stop the process and collect more evidence? I, of course, would argue for the latter, and nothing but the latter.

As I pointed out earlier, you can drown out bad priors with further evidence. If you don’t have any extra evidence, you can run the analysis with both priors; if both come to the same basic conclusion, it’s a difference without distinction. Alternatively, lay out your reasoning for each prior and see which premises or data points are causing the divergence.

This is a problem, I’ll grant, but it’s not a huge one.

I’m sure that if Hornbeck and I sat down with a lot of coffee and a few hours, and walked through all the evidence and discussed it and weighed it, we’d have a much better understanding of our views and the situation – that’s how thinking about facts is done – that’s the important analytical process of weighing evidence and understanding the impact of facts. I see throwing a bunch of ‘priors’ and a Bayesian probability out on the table as a cheap way of bypassing the hard work: “here are my conclusions based on my carefully selected facts.”

There’s a lively debate on whether or not our brains are Bayesian. Proponents point out that Bayesian algorithms behave a lot like our brains do, and in some cases outperform neural-net approaches. Opponents argue that merely acting Bayesian doesn’t prove they are Bayesian, and that Bayesian explanations are too flexible and prone to post-hoc rationalisation. I’m in the latter camp, but for a different reason: a finite computational device cannot hope to handle an infinite quantity, yet a true Bayesian analysis plays out over an unlimited and possibly infinite number of hypotheses. To actually implement a form of Bayesian inference, you have to take shortcuts, and those can introduce bias.

Conversely, if you want to drill down to the unvarnished truth you should explicitly lay out a Bayesian analysis, rather than relying on intuition. It is the very opposite of a cheap trick.

Enough of the generalities! Next time, I tackle the specifics of what Sam Biddle and Ranum have argued.