One Hundred Prisoners

Here’s a question to puzzle out:

An especially cruel jailer announces a “game” to their 100 prisoners. A cabinet with 100 drawers sits in a heavily-monitored room. In each drawer lies one prisoner’s number. If every prisoner draws their own number from a drawer, every one of them walks free; if even one of them fails, however, all the prisoners must spend the rest of their days in solitary confinement. Prisoners must reset the drawers and room after their attempt, otherwise all of them head to solitary, and to ensure they cannot give each other hints everyone goes directly to solitary after their attempt. The jailer does offer a little mercy, though: prisoners can check up to half the drawers in the cabinet during their attempt, and collectively they have plenty of time to brainstorm a strategy.

What is the best one they could adopt?

This seems like a hopeless situation, no doubt. The odds of any one prisoner randomly finding their number is 50%, and the odds of that happening 100 times are so low they make death by shark look like a sure thing.

Nonetheless, the prisoners settle on a strategy. With a little programming code, we can evaluate the chances it’ll grant all their freedom.

      Algorithm	    Trials	      Successes	Percentage
   Random Guess	     50000	              0	0.0000000
         Cyclic	     50000	          15687	31.3740000

Whhaaa? How can the prisoners pull off odds like that? [Read more…]

Model Failure

This may be hard to believe, but I’m not about to talk about Bayesian modeling nor CompSci. Nope, I got dragged into an argument over implicit bias with a science-loving “skeptic,” and a few people mobbed me over the “model minority.”

Asian-Americans, like Jews, are indeed a problem for the “social-justice” brigade. I mean, how on earth have both ethnic groups done so well in such a profoundly racist society? How have bigoted white people allowed these minorities to do so well — even to the point of earning more, on average, than whites? Asian-Americans, for example, have been subject to some of the most brutal oppression, racial hatred, and open discrimination over the years. In the late 19th century, as most worked in hard labor, they were subject to lynchings and violence across the American West and laws that prohibited their employment. They were banned from immigrating to the U.S. in 1924. Japanese-American citizens were forced into internment camps during the Second World War, and subjected to hideous, racist propaganda after Pearl Harbor. Yet, today, Asian-Americans are among the most prosperous, well-educated, and successful ethnic groups in America. What gives?

What gives is simple demographics. Take it away, Jeff Guo of the Washington Post: [Read more…]

Fake Hate, Frequentism, and False Balance

This article from Kiara Alfonseca of ProPublica got me thinking.

Fake hate crimes have a huge impact despite their rarity, said Ryan Lenz, senior investigative writer for the Southern Poverty Law Center Intelligence Project. “There aren’t many people claiming fake hate crimes, but when they do, they make massive headlines,” he said. It takes just one fake report, Lenz said, “to undermine the legitimacy of other hate crimes.”

My lizard brain could see the logic in this: learning one incident was a hoax opened up the possibility that others were hoaxes too, which was comforting if I thought that world was fundamentally moral. But with a half-second more thought, that view seemed ridiculous: if we go from a 0% hoax rate to 11% in our sample, we’ve still got good reason to think the hoax rate is low.

With a bit more thought, I realized I had enough knowledge of probability to determine who was right.

[Read more…]

P-values are Bullshit, 1942 edition

I keep an eye out for old criticisms of null hypothesis significance testing. There’s just something fascinating about reading this…

In this paper, I wish to examine a dogma of inferential procedure which, for psychologists at least, has attained the status of a religious conviction. The dogma to be scrutinized is the “null-hypothesis significance test” orthodoxy that passing statistical judgment on a scientific hypothesis by means of experimental observation is a decision procedure wherein one rejects or accepts a null hypothesis according to whether or not the value of a sample statistic yielded by an experiment falls within a certain predetermined “rejection region” of its possible values. The thesis to be advanced is that despite the awesome pre-eminence this method has attained in our experimental journals and textbooks of applied statistics, it is based upon a fundamental misunderstanding of the nature of rational inference, and is seldom if ever appropriate to the aims of scientific research. This is not a particularly original view—traditional null-hypothesis procedure has already been superceded in modern statistical theory by a variety of more satisfactory inferential techniques. But the perceptual defenses of psychologists are particularly efficient when dealing with matters of methodology, and so the statistical folkways of a more primitive past continue to dominate the local scene.[1]

… then realising it dates from 1960. So far I’ve spotted five waves of criticism: Jerzy Neyman and Egon Peterson head the first, dating from roughly 1928 to 1945; a number of authors such as the above-quoted Rozeboom formed a second wave between roughly 1960 and 1970; Jacob Cohen kicked off a third wave around 1990, which maybe lasted until his death in 1998; John Ioannidis spearheaded another wave in 2005, though this died out even quicker; and finally the “replication crisis” that kicked off in 2011 and is still ongoing as I type this.

I do like to search for papers outside of those waves, however, just to verify the partition. This one doesn’t qualify, but it’s pretty cool nonetheless.

Berkson, Joseph. “Tests of Significance Considered as Evidence.” Journal of the American Statistical Association 1942;37:325-35. International Journal of Epidemiology, vol. 32, no. 5, 2003, pp. 687.

For instance, they point to a specific example drawn from Ronald Fisher himself. The latter delves into a chart of eye facet frequency in Drosophila melanogaster, at various temperatures, and extracts some means. Conducting an ANOVA test, Fisher states “deviations from linear regression are evidently larger than would be expected, if the regression were really linear, from the variations within the arrays,” then concludes “There can therefore be no question of the statistical significance of the deviations from the straight line.”

Berkson’s response is to graph the dataset.eye facets vs. temperature, Drosophila Melangaster, graphed and fit to a line. From Fisher (1938).

The middle points look like outliers, but it’s pretty obvious we’re dealing with a linear relationship. That Fisher’s tests reject linearity is a blow against using them.

Jacob Cohen made a very strong argument against Fisherian frequentism in 1994, the “permanent illusion,” which he attributes to a paper by Gerd Gigerenzer in 1993.[3][4] I can’t find any evidence Gigerenzer actually named it that, but it doesn’t matter; Berkson scoops both of them by a whopping 51 years, then extends the argument.

Suppose I said, “Albinos are very rare in human populations, only one in fifty thousand. Therefore, if you have taken a random sample of 100 from a population and found in it an albino, the population is not human.” This is a similar argument but if it were given, I believe the rational retort would be, “If the population is not human, what is it?” A question would be asked that demands an affirmative answer. In the hull hypothesis schema we are trying only to nullify something: “The null hypothesis is never proved or established but is possibly disproved in the course of experimentation.” But ordinarily evidence does not take this form. With the corpus delicti in front of you, you do not say, “Here is evidence against the hypothesis that no one is dead.” You say, “Evidently someone has been murdered.”[5]

This hints at Berkson’s way out of the p-value mess: ditch falsification and allow evidence in favour of hypotheses. They point to another example or two to shore up their case, but can’t extend this intuition to a mathematical description of how this would work with p-values. A pity, but it was for the best.

[1] Rozeboom, William W. “The fallacy of the null-hypothesis significance test.” Psychological bulletin 57.5 (1960): 416.

[2] Berkson, Joseph. “Tests of Significance Considered as Evidence.” Journal of the American Statistical Association 1942;37:325-35. International Journal of Epidemiology, vol. 32, no. 5, 2003, pp. 687.

[3] Cohen, Jacob. “The Earth is Round (p < .05).” American Psychologist, vol. 49, no. 12, 1994, pp. 997-1003.

[4] Gigerenzer, Gerd. “The superego, the ego, and the id in statistical reasoning.” A handbook for data analysis in the behavioral sciences: Methodological issues (1993): 311-339.

[5] Berkson (1942), pg. 326.

Stat of the Union

Time to do another deep dive on polling in the US. The first item comes via Steven Rosenfeld over at AlterNet. A number of polling companies have examined Trump’s standing in swing states, and compared it to how they voted. Their findings? They like him more than the average American, but less than when they voted for him. As Chuck Todd/Mark Murray/Carrie Dann put it at MSNBC,

In the Trump “Surge Counties” — think places like Carbon, Pa., which Trump won, 65%-31% (versus Mitt Romney’s 53%-45% margin) — 56% of residents approve of the president’s job performance. But in 2016, Trump won these “Surge Counties” by a combined 65%-29%. And in the “Flip Counties” — think places like Luzerne, Pa., which Obama carried 52%-47%, but which Trump won, 58%-39% — Trump’s job rating stands at just 44%. Trump won these “Flip Counties” by a combined 51%-43% margin a year ago.

So the sagging of support I mentioned a few months ago continues to happen. Rosenfeld also links to a few interviews with Trump voters, to get a more qualitative idea of where they’re at. There’s no real change there, they have a pessimistic view of what he’ll accomplish but praise him as a disruptor in fairly irrational terms. Take Ellen Pieper.

Poll respondent Ellen Pieper is among those disapproving of the president’s performance so far. The independent from Waukee voted for Trump and said she still believes in his ideas and qualifications. It’s how he behaves that bothers her. “He’s trying to move the country in the right direction, but his personality is getting in the way,” she said, calling out his use of Twitter in particular. “He’s a bright man, and I believe he has great ideas for getting the country back on track, but his approach needs some polish.”

Still, Pieper says, she’d vote for him again today.

Rosenfeld also makes some interesting comparisons to Nixon, but you’ll have to click through for that.

The second item comes via G. Elliott Morris, who’s boosted some diagrams made by Ian McDonald as well as their own. [Read more…]

Russian Hacking and Bayes’ Theorem, Part 2

I think I did a good job of laying out the core hypotheses last time, save two: the Iranian government or a disgruntled Democrat did it. I think I can pick them up on-the-fly, so let’s skip ahead to step 2.

The Priors

What’s the prior odds of the Kremlin hacking into the DNC and associated groups or people?
I’d say they’re pretty high. Right back to the Bolshevik revolution, Russian spy agencies have taken an interest in running disinformation campaigns. They have a word for gathering compromising information to blackmail people into doing their bidding, “kompromat.” Putin himself earned a favourable place in Boris Yeltsin’s government via some kompromat of one of Yeltsin’s opponents.
As for hacking elections, European intelligence agencies have also fingered Russia for using kompromat to interfere with elections in Germany, the Netherlands, Hungary, Georgia, and Ukraine.
That’s all well and good, but what about other actors? China also has sophisticated information warfare capabilities, but they seem more interested in trade secrets and tend to keep their discoveries under wraps. North Korea is a lot more splashy, but recently have focused on financial crimes. The Iranian government has apparently stepped up their online attack capabilities, and have a grudge against the USA, but apparently focus on infrastructure and disruption.
The DNC convention was rather contentious, with fans of Bernie Sanders bitter at how it turned out, and putting Trump in power had been preferred to voting for Clinton, for some, but it doesn’t fit the timeline: the DNC was suspicious of an attack in April, documents were leaked in June, but Sanders still had a chance of winning the nomination until the end of July.
An independent group is the real wild card, with any number of motivations and due to their lack of power eager to make it look like someone else did the deed.
What about the CIA or NSA? The latter claims to be just a passive listener, and I haven’t heard of anyone claiming otherwise. The CIA has a long history of interfering in other countries’ elections; in 1990’s Nicaragua, they even released documents to the media in order to smear a candidate they didn’t like. It’s one thing to muck around with other countries, however, as it’ll be nearly impossible for them to extradite you over for a proper trial. Muck around in your own country’s election, and there’s no shortage of reporters and prosecutors willing to go after you.
Where does all this get us? I’d say to a tier of prior likelihoods:
  • “The Kremlin did it” (A) and “Independent hackers did it” (D) have about the same prior.
  • “China,” (B) “North Korea,” (C) “Iran,” (H) and “the CIA” (E) are less likely than the prior two.
  • “the NSA” (F) and “disgruntled insider” (I) is less likely still.
  • And c’mon, I’m not nearly good enough to pull this off. (G)

The Evidence

I haven’t placed quantities to the priors, because the evidence side of things is pretty damning. Let’s take a specific example: the Cyrillic character set found in some of the leaked documents. We can both agree that this can be faked: switch around the keyboard layout, plant a few false names, and you’re done. Do it flawlessly and no-one will know otherwise.
But here’s the kicker: is there another hypothesis which is more likely than “the Kremlin did it,” on this bit of evidence? To focus on a specific case, is it more likely that an independent hacking group would leave Cyrillic characters and error messages in those documents than Russian hackers? This seems silly; an independent group could leave a false trail pointing to anyone, which dilutes the odds of them pointing the finger at a specific someone. Even if the independent group had a bias towards putting the blame on Russia, there’s still a chance they could finger someone else.
Put another way, a die numbered one through six could turn up a one when thrown, but a die with only ones on each face would be more likely to turn up a one. A one is always more likely from the second die. By the same token, even though it’s entirely plausible that an independent hacking group would switch their character sets, the evidence still provides better proof of Russian hacking.
What does evidence that points away from the Kremlin look like?

President Vladimir Putin says the Russian state has never been involved in hacking.

Speaking at a meeting with senior editors of leading international news agencies Thursday, Putin said that some individual “patriotic” hackers could mount some attacks amid the current cold spell in Russia’s relations with the West.
But he categorically insisted that “we don’t engage in that at the state level.”

Is this great evidence? Hell no, it’s entirely possible Putin is lying, and given the history of KGB and FSB it’s probable. But all that does is blunt the magnitude of the likelihoods, it doesn’t change their direction. By the same token, this ….
Intelligence agency leaders repeated their determination Thursday that only “the senior most officials” in Russia could have authorized recent hacks into Democratic National Committee and Clinton officials’ emails during the presidential election.
Director of National Intelligence James Clapper affirmed an Oct. 7 joint statement from 17 intelligence agencies that the Russian government directed the election interference…
….  counts as evidence in favour of the Kremlin being the culprit, even if you think James Clapper is a dirty rotten liar. Again, we can quibble over how much it shifts the balance, but no other hypothesis is more favoured by it.
We can carry on like this through a lot of the other evidence.
I can’t find anyone who’s suggested North Korea or the NSA did it. The consensus seems to point towards the Kremlin, and while there are scattered bits of evidence pointing elsewhere there isn’t a lot of credibility or analysis attached, and some of it is “anyone but Russia” instead of “group X,” which softens the gains made by other hypotheses.
The net result is that the already-strong priors for “the Kremlin did it” combine with the direction the evidence points in, and favour that hypothesis even more. How strongly it favours that hypothesis depends on how you weight the evidence, but you have to do some wild contortions to put another hypothesis ahead of it. A qualitative analysis is all we need.
Now, to some people this isn’t good enough. I’ve got two objections to deal with, one from Sam Biddle over at The Intercept, and another from Marcus Ranum at stderr. Part three, anyone?

A Third One!

I know, I know, these are starting to get passé. But this third event brings a little more information.

For the third time in a year and a half, the Advanced Laser Interferometer Gravitational Wave Observatory (LIGO) has detected gravitational waves. […]

This most recent event, which we detected on Jan. 4, 2017, is the most distant source we’ve observed so far. Because gravitational waves travel at the speed of light, when we look at very distant objects, we also look back in time. This most recent event is also the most ancient gravitational wave source we’ve detected so far, having occurred over two billion years ago. Back then, the universe itself was 20 percent smaller than it is today, and multicellular life had not yet arisen on Earth.

The mass of the final black hole left behind after this most recent collision is 50 times the mass of our sun. Prior to the first detected event, which weighed in at 60 times the mass of the sun, astronomers didn’t think such massive black holes could be formed in this way. While the second event was only 20 solar masses, detecting this additional very massive event suggests that such systems not only exist, but may be relatively common.

Thanks to this third event, astronomers can set a stronger maximum mass for the graviton, the proposed name for any gravity force-carrying particle. They also have some hints as to how these black holes form; the axis of spin for these two black holes appear to be misaligned, which suggests they became binaries well after forming as opposed to starting off as binary stars in orbit. Finally, the absence of another signal tells us something important about intermediate black holes, thousands of times heavier than the Sun but less than millions.

The paper reports a “survey of the universe for midsize-black-hole collisions up to 5 billion light years ago,” says Karan Jani, a former Georgia Tech Ph.D. physics student who participated in the study. That volume of space contains about 100 million galaxies the size of the Milky Way. Nowhere in that space did the study find a collision of midsize black holes.

“Clearly they are much, much rarer than low-mass black holes, three collisions of which LIGO has detected so far,” Jani says. Nevertheless, should a gravitational wave from two Goldilocks black holes colliding ever gets detected, Jani adds, “we have all the tools to dissect the signal.”

If you want more info, Veritasium has a quick summary, while if you want something meatier the full paper has been published and the raw data has been released.

Otherwise, just be content that we’ve learned a little more about the world.

Russian Hacking and Bayes’ Theorem, Part 1

I’m a bit of an oddity on this network, as I’m pretty convinced Russia was behind the DNC email hack. I know both Mano Singham and Marcus Ranum suspect someone else is responsible, last I checked, and Myers might lean that way too. Looking around, though, I don’t think anyone’s made the case in favor of Russian hacking. I might as well use it as an excuse to walk everyone through using Bayes’ Theorem in an informal setting.

(Spoiler alert: it’s the exact same method we’d use in a formal setting, but with more approximations and qualitative responses.)

[Read more…]