I made an oblique reference to Bayesian arguments in a postscript to a posting, [stderr] and hadn’t realized that HJ Hornbeck has already been digging into exactly that topic, using exactly that example. [hj1] [hj2] With all respect to HJ, I’m going to use his example as an opportunity to critique some of how Bayesian arguments are used in the skeptical community.
First and foremost, I don’t think Bayesian arguments bring anything particularly valuable to the discussion – they’re a way of arguing about our subjective belief in an unknown – they don’t add anything to our ability to gain actual knowledge. So, in the case of the Russian hacking incident, I argue that we’d be better off investing our time in studying actual forensic evidence about the incident than hypothesizing about the probabilities our assumptions about the past are reflected in the present. If everyone believes that the Russians hacked the DNC, it still doesn’t mean that the Russians did; it just means a lot of people believe they did. Flipping that argument onto its head, I am inclined to dismiss the Bayesian argument with a load of snark, i.e.: “Those who don’t know, hypothesize. Those that are hypothesizing would do better to examine actual evidence.” Its somewhat defensible in a situation like “was jesus a real person?” where the evidence has been pretty thoroughly mined out, but with respect to something like Russian hacking accusations we would do better to insist on evidence rather than floundering about in a sea of conditional hypotheticals.
With regard to the Russian hacking, I’ve actually performed forensic investigations and have studied attribution carefully enough to know when I am confident attributing an attack, and what standard of evidence I’d require in order to be convinced. My position is that, since not enough evidence has been presented to make a solid attribution, using statistical models to argue about whether it was the Russians or not would be professional malpractice; the very purpose of attaining expert knowledge is to be able to make the best assessments of objective facts that we can. Depending on past probabilities is intellectual laziness, dishonesty, or ignorance, especially if it’s regarding a matter of any importance.
That’s a fairly standard objection to Bayesian arguments, which has doubtless been offered over and over to little avail. To me, it seems obvious: yes, we build our epistemology of the present on what has happened in the past, but that does not excuse us from exploring and measuring the present. If we have weak priors then the only thing that makes sense is to gather more information. If we have very solid priors, then it’s probably a waste of time to gather more information. When we’re in between, it seems that “gather more information” is the only path to truth, so why spend one’s time on statistical conjecture? 
Our first step is to gather up all the relevant hypotheses. There are a lot more choices than “the Kremlin did it,” such as
B) The Chinese government did it.
C) North Korea’s government did it.
D) A skilled independent hacking team did it.
E) The CIA did it.
F) The NSA did it.
G) I did it.
This is a good example of how “garbage in, garbage out” happens with Bayesian reasoning. For one thing, there are a lot more choices than the expanded list Hornbeck presents, above. I believe that Hornbeck is being intellectually honest, and is not deliberately manipulating his inputs in order to arrange a predetermined output, but intent really doesn’t matter: the list is heavily biased because it accidentally or deliberately leaves off some potentially high probability choices.
I respect Hornbeck’s intellectual honesty, here, as he includes a refutation of his own method in its exposition:
We should be as liberal as possible with our hypotheses, as it’s the easiest way to prevent bias. I could easily rig things by only including two hypotheses, A and G, but if I allow every hypothesis plus the kitchen sink then something I consider wildly unlikely could still become the most likely hypothesis of them all. The hypotheses should be as specific as possible (“bad people did it” won’t give you useful results) but not overly specific (“Sergei Mikhailov did it” is probably false, at best he led a team of hackers). When in doubt about a hypothesis, add it in.
I’ve always found that the easiest way to understand this stage of the process is to recognize it as pretty much a bog-standard social science survey. Think of it as a multiple choice question, like Hornbeck’s above – a question you frame, and then answer for yourself. I can see how one could argue this is a useful intellectual tool for an individual to explore their personal understanding of a problem, but I don’t see how it can or should possibly generalize beyond the individual’s experience.
Hornbeck left off two possibilities, but I could probably (if I exerted myself) go on for several pages of possibilities, in order to make assigning prior probabilities more difficult. But first: Hornbeck has left off at least two cases that I’d estimate as quite likely:
H) Some unknown person or persons did it
I) An unskilled hacker or hackers who had access to ‘professional’ tools did it
J) Marcus Ranum did it
H) ought to pose a serious problem for Bayesian probabilities, because the brute fact is that the vast majority of successful phishing attacks and exploitation is carried out by persons unknown, who are never found out. If you think about the problem in terms of “survey design” as I advocate above, you can see that Hornbeck’s list is unconsciously manipulating the reader away from the one most obvious answer, “none of the above.” By leaving “unknown” off the list, we’re done. Let me throw out some ‘priors’:
- In 2015, there were 1,966,324 registered notifications about attempted malware infections that aimed to steal money via online access to bank accounts. [kaspersky]
- Kaspersky Lab’s web antivirus detected 121,262,075 unique malicious objects: scripts, exploits, executable files, etc.
- 34.2% of user computers were subjected to at least one web attack over the year.
Within that vast stream of hostile activity, there are Russians. The amount is, of course, unknown. If we even wanted to accept for the sake of argument that the DNC attack was 100,000 times more likely to have been Russians (for whatever reason) the priors for “internet-based attack” are “most likely person or persons unknown.” That’s a nonsensical argument, of course, but that’s Bayesian priors for you, don’t blame me. Of course the Bayesian analyst will say “we need to focus on what we know about the particular case” i.e: look within my carefully cherry-picked data and you’ll find the data I picked. You want a probability? The likelihood is very high that we will never know who hacked the DNC.
There’s even more unconscious bias in Hornbeck’s list: he left Guccifer 2.0 off the list as an option. Here, you have someone who has claimed to be responsible left off the list of priors, because Hornbeck’s subconscious presupposition is that “Russians did it” and he implicitly collapsed the prior probability of “Guccifer 2.0” into “Russians” which may or may not be a warranted assumption, but in order to make that assumption, you have to presuppose Russians did it. Again, I am not accusing Hornbeck of intellectual dishonesty; this is a good vehicle for a discussion about the flaws inherent in using Bayesian arguments to promote subjective claims into objective claims.
I added J) because Hornbeck added himself. And, I added myself (as Hornbeck did) to dishonestly bias the sample: both Hornbeck and I know whether or not we did it. Adding myself as an option is biasing the survey by substituting in knowns with my unknowns, and pretending to my audience that they are unknowns. No, that is not an admission that I am Guccifer 2.0 – nor do I know who Guccifer 2.0 is – in that respect I’m as ignorant as pretty much everybody is. I’m falsely adding weight to my priors. If I knew HJ was Guccifer 2.0 my priors would mysteriously be adjusted in that direction; it’s basic sampling bias in survey design.
I) is also a problem for the “Russian hackers” argument. As I described [stderr] the DNC hack appears to have been done using a widely available PHP remote management tool after some kind of initial loader/breach. If you want a copy of it, you can get it from github. Now, have we just altered the ‘priors’ that it was a Russian? Remember – one of the critical arguments that it was Russian hackers was the tools they were using. The early stages of the argument that “it was Russians” depended on the expertise required meanwhile experienced security professionals have been pointing out over and over again that this was not a high expertise attack. We would be singing a different song if the hack was using an unreleased tool that contained fingerprints of TAO’s techniques, but we’re not. Imagine if you showed me a piece of furniture and said, “See, it was made by a master! The cuts in the wood are so straight and fine, only someone who has mastered a hand-saw can make cuts like that!” And I reply, “have you ever heard of a ‘table-saw’? That’s what we’re talking about here, if Guccifer 2.0 is the person who did it, it doesn’t mean they’re a Russian agent using super cyber-ninja tools developed by the FSB. I don’t think you can even construct a coherent Bayesian argument around the tools involved because there are possibilities:
- Guccifer is a Russian spy whose tradecraft is so good that they used basic off the shelf tools
- Guccifer is a Chinese spy who knows that Russian spies like a particular toolset and thought it would be funny to appear to be Russian
- Guccifer is an American hacker who used basic off the shelf tools
- Guccifer is an American computer security professional who works for an anti-malware company who decided to throw a head-fake at the US intelligence services
It’s like the poison contest in The Princess Bride – Vizzini can’t decide which cup the poison is in because he’s too smart for his own good. Since the premise is that the attacks were done by sophisticated spies, there’s no valid reason to rule out an arbitrary number of sophisticated head-fakes. One of the other problems with surveys and statistics is that they assume the person responding to the survey is not manipulating the survey. I see no reason to assume or not assume that.
Back to Hornbeck:
To do this we pretend we have no information regarding the DNC hack, merely that it occured, and asses how likely each hypothesis is in turn.
This is a nonsense posture of faux Cartesian ignorance. Of course you are not pretending you have no information about the attack! You encoded a great deal of knowledge about the attack in the initial set of options!
But not necessarily a lot of time. If you have a lot of evidence, or all your priors are roughly equal, then the evidence will overwhelm your priors. The less equal they are, or the less evidence you have, the more important it is to get the prior likelihoods right.
What Hornbeck is saying is, “Your data should be lightly fried, but not cooked completely. It should not be blackened on both sides, because data that is cooked too thoroughly gets rubbery and hard to chew.” Joking aside, I cannot read that as anything other than an adminition to make sure that your presuppositions are projected through your priors so that you weight the evidence to support the conclusion you already reached.
The quality of the evidence matters too. If the news anchor got their info from another weatherperson running a separate model, then the relative likelihood is almost as good as if it came directly from that person. If they got their info by looking at the clouds, then it barely increases the likelihood.
At this point I have to circle back to my earlier point: if there is empirical evidence that applies to the situation, it is going to be more important than any amount of conjecture. At which point, we should acknowledge that the empirical evidence is pretty much everything we need and focus on that, rather than trying to amplify our ignorance using statistical models.
Finally, plunk it all into Bayes’ Theorem and pit multiple hypotheses against each other.
I don’t believe that’s anything resembling an accurate description of what’s going on. In order to accurately describe what was going on, one might say something like:
“Finally, plug your assumptions into Bayes Theorem and you will learn which of your assumptions carries the most weight in your model. You gain no actual knowledge about objective reality, but you’ll achieve some clarity regarding your presuppositions and biases. Unfortunately, since you’re biased, you will have left unconsidered those things you did not think to consider, even if those things are vastly more likely than any of your assumptions.”
Switching to part 2 of Hornbeck’s piece:
What’s the prior odds of the Kremlin hacking into the DNC and associated groups or people?
I’d say they’re pretty high.
Why didn’t he just start by assuming the Russians did it? That’s what the article in Politico that he references also assumes. I’m comfortable with the possibility that the security consultants from SecureWorks that analyzed the Podesta attack may be right. I’m also comfortable with the fact that they have not presented enough evidence to me to convince me. I’d expect any of you to be skeptical if I say something and refuse to substantiate it (or delineate it as opinion if it’s not a matter of fact) and I expect any of you to be skeptical about what SecureWorks or anyone else says unless it has been vetted by at least some oppositional thinking. For example, when I read Kaspersky’s report on the TAO toolset, there was enough information contained within it that was consistent, matched other things I know, and did not contradict other things I know – I’m willing to accept the Kaspersky report as factual until someone presents contradicting information. Of course, then I wouldn’t need to be accepting someone else’s ‘priors’ – basically what Hornbeck is doing is saying that he accepts SecureWorks’ priors regarding another case entirely as applicable to the case of the DNC. I don’t see how that’s remotely defensible. You may as well just accept your ‘priors’ as being whatever some authority told you, and stop all the pseudoscientific math and posture of cartesian ignorance.
I think Hornbeck is arguing in good faith, but is accepting as ‘evidence’ things that experts in the field consider to be outright funny. Or, we would consider it to be outright funny if the consequences weren’t so serious. Back when the FBI was claiming North Koreans were hacking Sony some friends of mine and I were wearing “Elite Hacker Unit” Tshirts I made on spreadshirt.com using Google translate Korean. We thought that was funny. I still think the cyrillic character set angle to attribute hacking to Russia is funny. I mean, seriously. I don’t want to rain on anyone’s prior probabilities but hacking is a huge industry in Romania, Belarus, Ukraine, Poland, Slovenia, and Serbia – also countries that use cyrillic typefaces. Basically, it depends where you get your keyboard. Or, more precisely, it depends where whoever wrote your malware got their keyboard. It is not evidence.
That’s a huge problem for Bayesian analysis: you’ve got a bit of evidence stuck in your priors that one person appears willing to accept, while another is not. Does that mean you just go “too bad” and crunch your priors – garbage in, garbage out – or do you stop the process and collect more evidence? I, of course, would argue for the latter, and nothing but the latter. Once you have enough evidence then you don’t need all the framework of Bayesian analysis. One can say “Bayesian analysis is what you’re doing inside, Marcus” – which is true – but I’m not exposing my assumptions and I’m willing to talk about them openly: what do I know, what don’t I know, what do I wish I knew, what facts would I consider critically important?
It would be tedious to go blow-by-blow through what Hornbeck presents as the evidence supporting his ‘priors’ because it’s a mis-mosh of stuff reported by different sources, speculation, facts, and secondhand analysis. Some of it I think may be true, some of it I think may not be, some of the secondhand analysis I think is pretty good, some of it I think is incredibly thin and weak. I’ve written about the problems with that, elsewhere: what I need is a solid stack of facts that I can form my own assessment from, and until then I’m going to withhold judgement.
I don’t think Hornbeck’s deliberately being deceptive, and I think his explanation of how Bayesian reasoning can be applied is actually pretty good: he neatly illustrates the problem of trying to reach a conclusion from a mish-mosh of half-digested opinion, some facts, and a biased set of starting assumptions. It’s the scientism of Bayesian analysis that bothers me, in this case: it’s pretending to be able to better understand a situation than we can, given the facts we have available to us. I get downright suspicious when someone starts trying to present Bayesian ‘priors’ because it feels like I’m looking at a bunch of rigged data – and 100% of the time so far (how’s that for a prior?) – that’s what it’s been.
I’ve had many discussions about Bayesian priors and by the time we get through sorting out the assumptions, what we’ve had is an argument in which evidence is examined. That’s the ur-process for all skeptics and scientists: what is the evidence, what contradicts it, how does it align with other facts that we know? I’m sure that if Hornbeck and I sat down with a lot of coffee and a few hours, and walked through all the evidence and discussed it and weighed it, we’d have a much better understanding of our views and the situation – that’s how thinking about facts is done – that’s the important analytical process of weighing evidence and understanding the impact of facts. I see throwing a bunch of ‘priors’ and a Bayesian probability out on the table as a cheap way of bypassing the hard work: “here are my conclusions based on my carefully selected facts.”
Because Hornbeck forgot some of the important possibilities in his initial list, all of his subsequent effort is wasted; one cannot meaningfully apply estimates of likelihood after omitting an important part of the list of options, any attempts to “retcon” another option onto the list is going to be hopelessly biassed by confirmation bias and sunk cost bias. That may not be a problem for the proponents of Bayesian analysis, but to me it’s just bias piled atop bias with a bit of bias thrown in for kentucky windage.
The ancient skeptics argued that, to be convinced of a thing, one displays all the arguments for it and against it, examines them and considers them fairly, then if there is irrefutable argument for it (or against it) accepts the irrefutable argument. If there is doubt, one withholds judgement and waits for more arguments or facts. It’s a more difficult process, but it’s harder to game because it’s more obvious when someone has cherry-picked facts, or downplayed or omitted others. As I’ve said – I think Hornbeck’s Bayesian analysis is a pretty fair example of that kind of approach – I think he’s doing a service by illustrating how bad Bayesian analysis can be.
Now, to some people this isn’t good enough.
It’s not even close.
 This is nicely phrased in Andrew Gelman [gelman] as: “As the applied statistician Andrew Ehrenberg wrote in 1986, Bayesianism assumes: (a) Either a weak or uniform prior, in which case why bother?, (b) Or a strong prior, in which case why collect new data?, (c) Or more realistically, something in between, in which case Bayesianism always seems to duck the issue.”
“a matter of any importance” – obviously, if it’s not a matter of importance, then why perform the calculations at all? One of the problems with Bayesianism is that it presents a very strong argument while simultaneously disclaiming it as just a model. That’s a standard critique of statistics, namely that they tell you some interesting things about probabilities, but they can never actually confer knowledge about anything except the statistics themselves.
“bog-standard social science survey” – in a phrase, that’s why I reject most of what is being done in the social sciences as pseudoscience. Basically, it’s nothing more than a game of building circular definitions and publishing them as interesting results. How do we know Altemeyer’s authoritarianism index is any good? Because it reliably maps to the people who answer it as authoritarians. How do we know IQ tests are accurate? Because they accurately measure performance on IQ tests. Etc.
“promote subjective claims into objective claims” – because we are measuring and reasoning about people’s beliefs we are making objective claims about subjective beliefs. “9 out of 10 doctors recommend a glass of wine with dinner” does not mean a glass of wine with dinner is good or bad for your health; it just means that some percentage of doctors have some beliefs. I am immediately skeptical of such claims because I see them as consciously or unconsciously manipulative: they are trying to democratize truth. As Richard Feynman explained in his example of “how to measure the emperor of China’s nose”: take a load of responses and average them together, instead of asking the emperor’s assistant to ask the emperor’s permission to measure the damn thing.
Some stuff about the PAS tool webkit that was used on DNC [errata][securi] – if you open your ‘priors’ up to “anyone who can get their hands on a copy of PAS” instead of “Russian hackers” you’ve suddenly added about 100,000 additional potential attackers.
 I am referring here to Agrippa the Skeptic’s “mode of dispute” which was also presented in Sextus Empiricus.
I’ll also observe a funny thing about Bayesian reasoning: it’s got the same problem that AI have – you can crunch a whole lot of data down into a probability field or a Markov chain or a neural network but all they are capable of doing is emitting something that is mathematically highly similar to what you put into them. In that sense, “garbage in, garbage out” ought to serve as a refutation of Bayesian arguments, AI models, and human intelligence.