A Statistical Analysis of a Sexual Assault Case: Part Three

[complications arise, as does simplicity]

In the last installment, we calculated the odds of nesting or attempted nesting at site 84744 M.S. to be 92%, based on Hugh’s claim. We also found that daufnie_odie’s claim made us 11% confident in nesting.

Hugh, though, was talking about a different point in time. Our original question only asked if the nesting site had seen a nest or attempted nest, without any other clear bounds. It’s similar to asking “will I ever see heads while flipping this coin;” the more distinct observations we have, the greater the chance of at least one head (or nesting attempt) appearing.

The obvious way to combine these two claims is to consider all the possibilities. If we have two independent events, A and B, then the odds of at least one happening is the sum of the first happening but the second not, the second happening but not the first, and both happening. That isn’t too annoying to add up when we have just two events, but if we use this technique for N events we’ll have to consider 2^N – 1 possibilities. Ouch.

Notice, though, that we’re calculating the probability of every possible observation combination, excluding one: that no events occurred. However, by definition the sum of all probabilities must be one. So if we calculate the odds of that single combination and subtract it from one, we know the sum of the odds for every other combination. We can accomplish 2^N – 1 calculations for the cost of one!

Putting this into practice with our numbers above, we calculate the odds of Hugh being wrong about the nest and the odds of daufnie_odie being wrong, then multiply and subtract that from one, and get 93%. A marginal improvement.

But hold on here; why did I multiply those two together? Let’s pull up a diagram:

Dividing the universe by the accounts of Hugh and daufnie_odieOur goal is to figure out A / (A + B + C + D). We can use a bit of algebra to rewrite that as

image

Oh, there’s our multiplication right there! In English, all we have to do is multiply the odds of daufnie_odie being wrong, by the odds of Hugh being wrong when we assume daufne_odie is wrong.

One problem: we don’t know the odds of the latter, just the odds of Hugh being wrong overall. If those two were dependent events, this could be a big problem, but thankfully they’re independent for our purposes; if we’re calculating the odds of no nest or attempt, we don’t care if two or more people are talking about the same event, we just need them to be wrong about whatever they’re talking about. That means that the vertical partition is exactly as it looks in the diagram, a straight cut across the entire probability space. In math terms, the ratio of A to B is the same as that of C to D, which leads to

image

So as long as we can be confident daufnie_odie’s claim is independent of Hugh’s, we can treat (A + B) / (A + B + C + D) as A / (A + B) and just multiply.

But when we take a closer look at daufnie_odie’s post, we realize we’re missing some key facts. They spoke up after reading another post by Pollock Myerson, wondering if the person who contacted them was the same as the one who contacted Myerson. Hopping over to Myerson’s post, we learn that he was introduced to someone claiming to have spotted a nest by Caroline Puppy, and that later on a third person contacted Myerson to validate the original tale. Again no names are mentioned, but Myerson, Puppy, and the third person make it clear that they know this nest claimant.

Scrolling back, we see someone named Bryant Tompsin claiming to know a witness to an attempted nest. This doesn’t look like the same person that contacted Puppy. There’s also a comment by someone who goes by “maryann”, who claims to have spotted at least an attempted nest; whether this is the same person that Tompsin, Puppy or daufnie_odie referred to isn’t clear, but it’s probably not Hugh under a different name.

Scrolling forward, we also find a few posts where Pauline Gray claims to have seen a Sexualis Asoltenti attempt to nest, but leaves out what nesting site she saw it at. Puppy reappears, claiming that she was told by someone named Dijai Gruthi that there was an attempted nesting at 84744 M.S., a fact she later confirmed with someone else who witnessed the same nesting. By comparing photos and accounts, it becomes probable that Pauline Gray was talking about 84744 M.S., that she saw it at the same time as Gruthi, and that Puppy’s other person is Gray. In the meanwhile, Tompsin reappears and also claims to have heard of the same attempted nesting from Gruthi.

As all that’s sinking in, we flip open the local birding magazine and find still more. Pauline Gray admits she really was talking about 84744 M.S. and that Gruthi was present for the attempted nest; the unnamed person of Myerson outs themselves as Ali Smyth, a local birder; and a well-respected person named Jim Grandie suggests he saw a nest or attempt at one but waved it off as horseplay, something birds do when drunk. Biff Jag confirms he was around shortly after Smyth’s nesting observation, and someone with the handle “skippingthem” mentions they know someone who was also a witness. daufnie_odie posts again, and confirms that the nesting Ali Smyth saw was not the one they were aware of. Finally, we can infer some information from the state of the nesting site; if it remains constant, that would suggest a nesting or attempt was unlikely, and if it shifted over time then it likely was nested in at some point. Myerson had a look at the long-term state of 84744 M.S., and indeed found evidence of shifting.

Working through all these combinations would be a nightmare. Fortunately, we don’t have to. As we only care if at least one nest or attempted nest happened, we can instead calculate the odds of no nesting occurring and then subtract that from one. This is a much simpler task, which we’ll accomplish in the next installment

[HJH 2015-07-19: adding some missing links]

A Statistical Analysis of a Sexual Assault Case: Part Two

[the fundamentals of the birds and the bees]

Forget all that talk of sexual assault from last time. Instead, pretend I’m an ornithologist.

Wandering past nesting site 84744 M.S. one day, I wonder if a Sexualis Asoltenti has ever flown in and either nested or attempted to nest there. From various studies, I know the odds of that happening are between six and thirteen percent, making it unlikely. Still, I’m just one person; what have other birdwatchers seen? When I get home, I pull up the favourite web forum for local birders and have a look.

I immediately spot a post by Douglas Hugh, who claims to have seen a nesting Sexualis Asoltenti there. What does that do to the odds? Let’s diagram it out.

The entire universe of possible outcomes.This rectangle represents every possible situation: that no nest exists, that it was made of discarded twine, that Wile E. Coyote instead threw an Acme Portable Hole in there, and so on. We can slice that space by partitioning it into two, one side containing all possibilities where the nest was built or attempted, the other containing the inverse.
Partitioning the probabilities into [I should mention these areas aren’t to scale. I’m just focusing on topology here.]

As this rectangle represents every possibility, it also contains scenarios that include Hugh claiming a nest, as well as Hugh not making any such claim. We can further partition the space.

All possibilities partitioned both by whether or not a nest/attempt was made, and whether or not Hugh claims to have seen a nest.[I should also mention that these boundaries aren’t necessarily accurate. Topology, remember. Also, I wrote this a good three weeks before I saw Jamie’s similar post about Bayes’ Theorem over at SkepChick. Scout’s honour!]

Those previous studies I mentioned represent the area of (A + C) divided by the area of (A + B + C + D).

While we may not know the status of the nest, we do know whether or not Hugh made the claim. Areas C and D are contrary to reality, thus should be dropped from this analysis. The odds of a nest or attempted nest is now the area of A divided by the area of (A + B); in English, that’s the number of instances where Hugh claims a nest, and there is one, as compared to the number of instances where he falsely claims there’s a nest there plus the number of true claims.

As luck would have it, we already have a number to substitute in. Prior research puts the odds of a false nesting claim for Sexualis Asoltenti at between 2-8%; this means that the odds of A / (A + B) are about 92-98%. I’ll take the more conservative value, and say 8% of claims are mistaken, fabricated, or something else. Easy enough.

After figuring all that out, I spot a post from someone named “daufnie_odie.” They claim to have heard a birder mention they’d spotted a nest at 84744 M.S.. No name is given, but the context makes it fairly clear they know this person.

We got lucky last time, because that 8% was for cases where someone claimed they saw a nest or attempted nest, which was exactly the scenario we had. No such luck here, plus there’s a layer of indirection we need to account for. Here’s a first attempt at that:

All probabilities, partitioned by whether there was an attempted/actual nest AND daufnie_odie was approached, vs. daufnie_odie making a claim.On our diagram, the odds of “someone genuinely spots a nest or attempt and mentions it to daufnie_odie” corresponds to the areas where daufnie_odie was approached, A and C, divided by all areas, which is (A + C) / (all). As this box represents all possibilities, and has a total area of one, the odds of the negation of the prior claim (specifically, that there was no nesting, or a false claim, or the news never reaching daufnie_odie), is (1 – (A + C) / (all)) or (B + D) / (all).

Even if that original person saw a nest, though, it’s possible they’d never mention it. We know the first probability, so I’ll put the second at… oh… one third, then multiply the two values together to reach the chance of both events happening.

[Why multiplication? I’ll explicitly cover that in part 3, but if you pay real close attention you’ll get a preview below.]

At this point, I bet a number of you are about to quit in disgust. I just pulled that number out of thin air, and doesn’t that taint the whole enterprise?

If that probability is wildly different from reality, it might. Or, it might not. As I pointed out earlier, if we’re testing the bias of a coin and take a few bad tosses, that could throw off the measurement… but only if we only do a dozen throws. If we do a thousand, it’ll have no significant effect on our final results. Likewise, a bad guess among several good ones will be neutralized, and a lot of fuzzy measurements can combine to create a precise one.

Most importantly, we live in an era of cheap computing. I can run a large number of simulations and check how the parameters change over a wide range of values, giving myself a solid idea of how stable the results are. A little fuzziness is no problem, and who knows? My ad-hoc guess could be bang on the money. This is also handy for anyone who disagrees with my numbers; just plug in your own instead and rerun the analysis.

But back to that. We now need to figure out the odds of daufnie_odie publicly stating their claim, assuming they actually were approached. Maybe they’d forget, or be embarrassed by the situation, but that’s highly unlikely (92%-98% of such claims are legitimate, remember), and this person has some protection by being pseudo-anonymous. I’ll make this probability fairly high, say 95% or so. This corresponds to A / (A + C) in the diagram.

There’s also the possibility that daufnie_odie is making the entire thing up. The pseudo-anonymous argument cuts both ways, also arguing that a false claim is more likely. Nonetheless, an anonymous person that’s careless could be tracked down and held accountable for their words. Given all that, let’s put this probability at an even 50/50. Note that this corresponds to B / (B + D).

Now we can calculate A / (A + B). Multiplying the odds of nesting and this person approaching daufnie_odie, with the odds of daufnie_odie sharing the claim with us, nets us A; multiplying the odds of no nesting or daufnie_odie being approached, with the odds of daufnie_odie making the whole thing up, arrives at B. Put A in the denominator, and the sum of (A + B) in the numerator.

The full math behind daufnie_odie's case. Trust me, it's a bit ugly looking.That’s a pain to write out, though. Let’s clean things up with some substitution; we’ll call the claim “there was a nest or attempted nest and daufnie_odie was approached by a witness” by the letter “H”, and daufnie_odie’s stating that happened will become “E”. To denote the opposite of a claim, like “daufnie_odie did not state he knew of nesting,” we’ll put a little mark in front of it; in this case, that’d look like “¬E”. To refer specifically to the probability of X happening, we’ll say “P(X)”, and if we talk about the odds of X happening given Y did happen, we’ll write “P(X | Y)”. With these simplifications, the math translates into

Bayes' Theorem, in binary mode.Whoops, we’ve accidentally derived a simplified version of Bayes’ Theorem. Ah well, either way we’ve calculated an 11% chance that there was a nest or attempted nest, given daufnie_odie’s post (though as you’ll see later, that number’s a bit naive). As we’re partitioning the probability space, that implies an 89% chance there was no nest or attempt at one.

How do we combine these two accounts together? That’s for part 3

[HJH 2015-06-09: Minor edits for clarity.]
[HJH 2015-06-19: Emphasized daufnie_odie’s probability would change later.]
[HJH 2015-07-19: Adding a missing link.]