Russian Hacking and Bayes’ Theorem, Part 1


I’m a bit of an oddity on this network, as I’m pretty convinced Russia was behind the DNC email hack. I know both Mano Singham and Marcus Ranum suspect someone else is responsible, last I checked, and Myers might lean that way too. Looking around, though, I don’t think anyone’s made the case in favor of Russian hacking. I might as well use it as an excuse to walk everyone through using Bayes’ Theorem in an informal setting.

(Spoiler alert: it’s the exact same method we’d use in a formal setting, but with more approximations and qualitative responses.)

Step 1. What are the Hypotheses?

Our first step is to gather up all the relevant hypotheses. There are a lot more choices than “the Kremlin did it,” such as

B) The Chinese government did it.
C) North Korea’s government did it.
D) A skilled independent hacking team did it.
E) The CIA did it.
F) The NSA did it.
G) I did it.

We should be as liberal as possible with our hypotheses, as it’s the easiest way to prevent bias. I could easily rig things by only including two hypotheses, A and G, but if I allow every hypothesis plus the kitchen sink then something I consider wildly unlikely could still become the most likely hypothesis of them all. The hypotheses should be as specific as possible (“bad people did it” won’t give you useful results) but not overly specific (“Sergei Mikhailov did it” is probably false, at best he led a team of hackers). When in doubt about a hypothesis, add it in.

Step 2. What are the Priors for each Hypothesis?

Not all hypotheses are equal, of course, and unless we’re completely clueless we’ll assign different prior probabilities to every hypothesis. To do this we pretend we have no information regarding the DNC hack, merely that it occured, and asses how likely each hypothesis is in turn.

A good way is to just assign numbers to each hypothesis, representing how likely they are relative to one another. You can renormalize them afterwards so they’re all in the range 0-1, but that’s not strictly necessary. If you can’t decide on a single number, assign a range of numbers. What’s critical is that the likelihoods are proportional to each other; if A’s prior is twice B’s and B’s prior is twice C’s then A’s prior likelihood must be four times that of C’s. Spend some time getting that right.

But not necessarily a lot of time. If you have a lot of evidence, or all your priors are roughly equal, then the evidence will overwhelm your priors. The less equal they are, or the less evidence you have, the more important it is to get the prior likelihoods right.

Step 3: For each bit of Evidence, how Likely is each Hypothesis?

Once finished with the priors, gather up all the evidence you can. Again, be generous and inclusive. Then, for each bit of evidence, determine how likely it is under each hypothesis. This is a lot like what you did for the priors.

What constitutes “bits of evidence” can be tricky at times. Say that a news anchor and weatherperson both claim that it will rain tomorrow. If the anchor got the information from the weatherperson, you could lump them together as a single bit of evidence. It would be more accurate to keep them separate but treat them as correlated; having a second source does decrease the chance of miscommunication and suggests the weatherperson is confident in their prediction, so the relative certainty of rain tomorrow nudges up ever so slightly with the addition of the anchor’s testimony.

The quality of the evidence matters too. If the news anchor got their info from another weatherperson running a separate model, then the relative likelihood is almost as good as if it came directly from that person. If they got their info by looking at the clouds, then it barely increases the likelihood.

Think carefully here too, especially if there isn’t much evidence. As with priors, getting the relative likelihoods right is critical.

Step 4: Combine it all

Finally, plunk it all into Bayes’ Theorem and pit multiple hypotheses against each other. For each round of evidence, try to feel towards the hypothesis most favoured by the evidence, then anchor the likelihoods of the rest relative to it. If after a few rounds a collection of hypotheses look like they’ll never rise to the top, you can start taking shortcuts to reduce your calculations. Ideally, this should round up their likelihoods relative to an honest weighting. There’s no harm in revising likelihoods that you assigned to priors or evidence, so long as you do not hide or forget the previous values.

The odds ratio version of Bayes’ Theorem is usually the easiest to deal with, as you only ever need to compare two hypotheses to each other. By doing multiple pairwise comparisons, you do more work but it’s easier, and it still covers the same territory as Bayes Original.

This is getting a bit long, so I’ll split the implementation of the above into a second blog post.