Continued Fractions

If you’ve followed my work for a while, you’ve probably noted my love of low-discrepancy sequences. Any time I want to do a uniform sample, and I’m not sure when I’ll stop, I’ll reach for an additive recurrence: repeatedly sum an irrational number with itself, check if the sum is bigger than one, and if so chop it down. Dirt easy, super-fast, and most of the time it gives great results.

But finding the best irrational numbers to add has been a bit of a juggle. The Wikipedia page recommends primes, but it also claimed this was the best choice of all:\frac{\sqrt{5} - 1}{2}

I couldn’t see why. I made a half-hearted attempt at digging through the references, but it got too complicated for me and I was more focused on the results, anyway. So I quickly shelved that and returned to just trusting that they worked.

That is, until this Numberphile video explained them with crystal clarity. Not getting the connection? The worst possible number to use in an additive recurrence is a rational number: it’ll start repeating earlier points and you’ll miss at least half the numbers you could have used. This is precisely like having outward spokes on your flower (no seriously, watch the video), and so you’re also looking for any irrational number that’s poorly approximated by any rational number. And, wouldn’t you know it…

\frac{\sqrt{5} - 1}{2} ~=~ \frac{\sqrt{5} + 1}{2} - 1 ~=~ \phi - 1

… I’ve relied on the Golden Ratio without realising it.

Want to play around a bit with continued fractions? I whipped up a bit of Go which allows you to translate any number into the integer sequence behind its fraction. Go ahead, muck with the thing and see what patterns pop out.

Abductive and Inferential Science

I love it when Professor Moriarty wanders back to YouTube, and his latest was pretty good. He got into a spot of trouble at the end, which led me to muse on writing a blog post to help him out. I’ve already covered some of that territory, alas, but in the process I also stumbled on something more interesting to blog about. It also effects Sean Carroll’s paper, which Moriarty relied on.

The fulcrum of my topic is the distinction between inference and abduction. The former goes “I have a hypothesis, what does the data say about it?,” while the latter goes “I have data, can I find a hypothesis which explains it?” Moriarty uses this as a refutation of falsification: if we start from the data instead of the hypothesis, we’re not trying to falsify anything! To add salt to the wound, Moriarty argues (and I agree) that a majority of scientific activity consists of abduction and not inference; it’s quite common for scientists to jump from one topic to another, essentially engaging in a tonne of abductive activity until someone forces them to write up a hypothesis. Sean Carroll doesn’t dwell on this as much, but his paper does treat abduction and inference as separate things.

They aren’t separate, at least when it comes to the Bayesian interpretation of statistics. Let’s use a toy example to explain how; here’s a black box with a clear cover:

import ("math/rand")

func blackbox() float64 {

     x := rand.Float64()
     return (4111 + x*(4619 + x*(3627 + x*(7392*x - 9206)))/1213
     }

Each time we turn the crank on this function, we get back a number of some sort. The abductive way to analyse this is pretty straightforward: we grab a tonne of numbers and look for a hypothesis. I’ll go for the mean, median, and standard deviation here, the minimum I’ll need to check for a Gaussian distribution.

Samples = 1000001
Mean    = 5.61148
Std.Dev = 1.40887
Median  = 5.47287

Looks like there’s a slight skew downwards, but it’s not that bad. So I’ll propose that the output of this black box follows a Gaussian distribution, with mean 5.612 and standard deviation 1.409, until I can think of a better hypothesis which handles the skew.

After we reset for the inferential analysis, we immediately run into a problem: this is a black box. We know it has no input, and outputs a floating-point number, and that’s it. How can we form any hypothesis, let alone a null and alternative? We’ve no choice but to make something up. I’ll set my null to be “the black box outputs a random floating-point number,” and the alternative to “the output follows a Gaussian distribution with a mean of 0 and a standard deviation of 1.” Turn the crank, aaaand…

Samples            = 1000001
log(Bayes Factor)  = 26705438.01142
  (That means the most likely hypothesis is H1 (Gaussian distribution, mean = 0, std.dev = 1))

Unsurprisingly, our alternative does a lot better than our null. But our alternative is wrong! We’d get that impression pretty quickly if we watched the numbers streaming in. There’s an incredible temptation to take that data to refine or propose a new hypothesis, but that’s an abductive move. Inference is really letting us down.

Worse, this black box isn’t too far off from the typical science experiment. It’s rare any researcher is querying a black box, true, but it’s overwhelmingly true that they’re generating new data without incorporating other people’s datasets. It’s also rare you’re replicating someone else’s work; most likely, you’re taking existing ideas and rearranging them into something new, so prior findings may not carry forward. Inferential analysis is more tractable than I painted it, I’ll confess, but the limited information and focus on novelty still favors the abductive approach.

But think a bit about what I did on the inferential side: I picked two hypotheses and pitted them against one another. Do I have to limit myself to two? Certainly not! Let’s rerun the analysis with twenty-two hypotheses: the flat distribution we used as a null before, plus twenty-one alternative hypotheses covering every integral mean from -10 to 10 (though keeping the standard deviation at 1).

Samples                                 = 100001
log(likelihood*prior), H0               = -4436161.89971
log(likelihood*prior), H1, mean = -10   = -12378220.82173
log(likelihood*prior), H1, mean =  -9   = -10866965.39358
log(likelihood*prior), H1, mean =  -8   = -9455710.96544
log(likelihood*prior), H1, mean =  -7   = -8144457.53730
log(likelihood*prior), H1, mean =  -6   = -6933205.10915
log(likelihood*prior), H1, mean =  -5   = -5821953.68101
log(likelihood*prior), H1, mean =  -4   = -4810703.25287
log(likelihood*prior), H1, mean =  -3   = -3899453.82472
log(likelihood*prior), H1, mean =  -2   = -3088205.39658
log(likelihood*prior), H1, mean =  -1   = -2376957.96844
log(likelihood*prior), H1, mean =   0   = -1765711.54029
log(likelihood*prior), H1, mean =   1   = -1254466.11215
log(likelihood*prior), H1, mean =   2   = -843221.68401
log(likelihood*prior), H1, mean =   3   = -531978.25586
log(likelihood*prior), H1, mean =   4   = -320735.82772
log(likelihood*prior), H1, mean =   5   = -209494.39958
log(likelihood*prior), H1, mean =   6   = -198253.97143
log(likelihood*prior), H1, mean =   7   = -287014.54329
log(likelihood*prior), H1, mean =   8   = -475776.11515
log(likelihood*prior), H1, mean =   9   = -764538.68700
log(likelihood*prior), H1, mean =  10   = -1153302.25886
  (That means the most likely hypothesis is H1 (Gaussian distribution, mean = 6, std.dev = 1))

Aha, the inferential approach has finally gotten us somewhere! It’s still wrong, but you can see the obvious solution: come up with as many hypotheses as you can to explain the data, before we look at it, and run them all as the data rolls in. If you’re worried about being swamped by hypotheses, I’ve got a word for you: marginalization. Bayesian statistics handles hypotheses with parameters by integrating over all of them; you can think of these as composites, a mash of point hypotheses which collectively do a helluva lot better at prediction than any one hypothesis in isolation. In practice, then, Bayesians have always dealt with large numbers of hypotheses simultaneously.

The classic example of this is conjugate priors, where we carefully combine hyperparameters to evaluate a potentially infinite family of probability distributions. In fact, let’s try it right now: the proper conjugate here is the Normal-Inverse-Gamma, as we’re tracking both the mean and standard deviation of Gaussian distributions.

Samples = 1000001
μ       = 5.61148
λ       = 1000001.00000
α       = 500000.50000
β       = 992457.82655

median  = 5.47287

That’s a good start, μ lines up with the mean we calculated earlier, and λ is obviously the sample count. The shape of the posteriors is still pretty opaque, though; we’ll need to chart this out by evaluating the Normal-Inverse-Gamma PDF a few times.Conjugate posterior for the collection of all Gaussian distributions which could describe the data.Excellent, the inferential method has caught up to abduction! In fact, as of now they’re both working identically. Think: what’s the difference between a hypothesis you proposed before collecting the data, and one you proposed after? In frequentism, the stopping problem implies that we could exit early and falsely reject our null, when data coming down the pipe would have pushed it back to “fail to reject.” There, the choice of hypothesis could have an influence on the outcome, so there is a difference between the two cases. This is made worse by frequentism’s obsession over one hypothesis above all others, the null.

Bayesian statistics is free of that problem, because every hypothesis is judged on their relative likelihood in reference to a dataset shared by all hypotheses. There is no stopping problem baked into the methodology. Whether I evaluate any given hypothesis before or after I collect the data is irrelevant, because either way it has to cope with all the data. This also frees me up to invent hypotheses whenever I wish.

But this also defeats the main attack against falsification. The whole point of invoking abduction was to save us from asserting any hypotheses in the beginning; if there’s no difference in when we invoke our hypotheses, however, then falsification might still apply.

Here’s where I return to giving Professor Moriarity a hand. He began that video by saying scientists usually don’t engage in falsification, hence it cannot be The Scientific Method, but ended it by approvingly quoting Feynman: “We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.” Isn’t that falsification, right there?

This is yet another area where frequentist and Bayesian statistics diverge. As I pointed out earlier, frequentism is obsessed with falsifying the null hypothesis and trying to prove it wrong. Compare and contrast with what past-me wrote about Bayes Factors:

If data comes up that doesn’t square well with a hypothesis, its certainty takes a hit. But if we’re comparing it to another hypothesis that also doesn’t predict the data, the Bayes Factor will remain close to 1 and our certainties won’t shift much at all. Likewise, if both hypotheses strongly predict the data, the Factor again stays close to 1. If we’re looking to really shift our certainty around, we need a big Bayes Factor, which means we need to find scenarios where one hypothesis strongly predicts the data while the other strongly predicts this data shouldn’t happen.

Or, in other words, we should look for situations where one theory is… false. That sounds an awful lot like falsification!

But it’s not the same thing. Scroll back up to that Normal-Inverse-Gamma PDF, and pick a random point on the graph. The likelihood at that point is less than the likelihood at the maximum point. If you were watching those two points as we updated with new data, your choice would have gradually gone from about equally likely to substantially less likely. Your choice is more likely to be false, all things being equal, but it’s also not false with a capital F. Maybe the first million data points were a fluke, and if we continued sampling to a billion your choice would roar back to the top? This is the flip-side of having no stopping problem: the door is always left open a crack for any crackpot hypotheses to make a comeback.

Now look closely at the scale of the vertical axis. That maximal likelihood is well above 100%! In fact it’s somewhere around 4,023,000% by my calculations. While the vast majority are dropping downwards, there’s an ever-shrinking huddle of points that are becoming more likely as data is added! Falsification should only make things less likely, however.

Under Bayesian statistics, falsification is treated as a heuristic rather than a core part of the process. We’re best served by trying to find areas where hypotheses differ, yet we never declare one hypothesis to be false. This saves Moriarty: he’s both correct in disclaiming falsification, and endorsing the process of trying to prove yourself wrong. The confusion between the two stems from having to deal with two separate paradigms that appear to have substantial overlap, even though a closer look reveals fundamental differences.

“Aggressive, unpredictable, unreliable”

It’s funny, Trump didn’t used to be this opposed to Iran. Now, between all the domestic scandals he faces, and his love of military power along with the warmongering far-right, he’s decided to reverse course and get aggressive with Iran.

“It is clear to me that we cannot prevent an Iranian nuclear bomb under the decaying and rotten structure of the current agreement,” Trump said from the White House Diplomatic Room. “The Iran deal is defective at its core. If we do nothing we know exactly what will happen.” In announcing his decision, Trump said he would initiate new sanctions on the regime, crippling the touchstone agreement negotiated by his predecessor. Trump said any country that helps Iran obtain nuclear weapons would also be “strongly sanctioned.”
“This was a horrible one-sided deal that should have never, ever been made,” the President said. “It didn’t bring calm, it didn’t bring peace, and it never will.” … “At the point when the US had maximum leverage, this disastrous deal gave this regime — and it’s a regime of great terror — many billions of dollars, some of it in actually cash — a great embarrassment to me as a citizen,” Trump said.

One problem: what are the consequences of withdrawing? Iran’s nuclear program was going fine when they were under earlier sanctions, so imposing sanctions isn’t going to have much effect. As for the political situation within Iran,

Sadeq Zibakalam, a prominent political commentator and professor of politics at Tehran University, struck a pessimistic tone about the consequences of Trump’s decision in Iran. “Many people are worried about war,” he told the Guardian on phone from Tehran. “Whenever the country faces a crisis in its foreign policy or economy, the situation gets better for hardliners, they’d be able to exert their force more easily.”

He added: “At the same time, hardliners will gain politically from this situation, because they’ll attack reformists and moderates like [President] Rouhani that this is evidence of what they had been saying for years, that the US cannot be trusted, and that US is always prepared to knife you in the back.”

Zibakalam, who is close to the reformists, said he did not think it would take long for Europeans and other nations to follow in the footsteps of the US, because they won’t endanger their economic ties with Washington, which would outweigh the benefits of doing business with Iran.

Rouhani has taken an aggressive stance to jump in front of the hardliners.

“This is a psychological war, we won’t allow Trump to win… I’m happy that the pesky being has left the Barjam,” he said referring to Persian acronym for JCPOA or the nuclear deal.

“Tonight we witnessed a new historic experience… for 40 years we’ve said and repeated that Iran always abides by its commitments, and the US never complies, our 40-year history shows us Americans have been aggressive towards great people of Iran and our region .. from the [1953] coup against the legitimate government of [Mohammad] Mosaddegh Mosadeq government and their meddling in the affairs of the last regime, support for Saddam [Hussein during Iran-Iraq war] and downing or our passenger plane by a US vessel and their actions in Afghanistan, in Yemen,” he said.

“What Americans announced today was a clear demonstration of what they have been doing for months. Since the nuclear deal, when did they comply? They only left a signature and made some statements, but did nothing that would benefit the people of Iran.”

Rouhani said the International Atomic Energy Agency (the IAEA) has verified that Tehran has abide by its obligations under the deal. “This is not an agreement between Iran and the US… for US to announce it’s pulling out, it’s a multilateral agreement, endorsed by the UN security council resolution 2231, Americans officially announcement today showed that their disregard for international commitments.. We saw that in their disregard for Paris agreement..

“Our people saw that the only regime that supports Trump is the illegitimate Zionist regime, the [s]ame regime that killed our nuclear scientists”

“From now on, this is an agreement between Iran and five countries… from now on the P5+1 has lost its 1… we have to wait and see how other react. If we come to the conclusion that with cooperation with the five countries we can keep what we wanted despite Israeli and American efforts, Barjam can cursive,” he said referring to Persian acronym for JCPOA or the nuclear deal.

“We had already come to the conclusion that Trump will not abide by international commitments and won’t respect Barjam.”

And the other signers to the Iran deal are keeping a stiff upper lip, at least for now.

According to the IAEA, Iran continues to abide by the restrictions set out by the JCPoA, in line with its obligations under the Treaty on the Non-Proliferation of Nuclear Weapons. The world is a safer place as a result. Therefore we, the E3, will remain parties to the JCPoA. Our governments remain committed to ensuring the agreement is upheld, and will work with all the remaining parties to the deal to ensure this remains the case including through ensuring the continuing economic benefits to the Iranian people that are linked to the agreement.

Most commentators are united in calling the withdrawal a prelude to disaster. Most Americans were fine with the Iran deal. Most of the world is starting to get on board this train:

Last year, on a reporting trip though a few European capitals, something I heard over and over from European foreign policy officials: We remember 2003, and we’re starting to think this is the real America. Aggressive, unpredictable, unreliable, and dangerous.

Checking in on Local Politics

I’m ridiculously bad at following local politics, the American Implosion is far too addicting. Still, it does allow me to cross-reference between our two countries, and watch Southern trends migrate up North. For instance:

In a speech about women in politics at the United Conservative Party founding annual general meeting in Red Deer Saturday, [Heather] Forsyth expressed disbelief that women face structural barriers and are marginalized. “How the heck do you expect to get women involved in politics and get them excited when you have to read that socialist crap,” she said as a number of UCP members hooted and clapped. “When I ran in the nomination, which was one of the most hotly-contested nominations in the province, I didn’t play the ‘oh, poor me’ card. Nor did I play the ‘I’m a woman and they should provide me with a hand-up.’ ”

Forsyth … also criticized Prime Minister Justin Trudeau and Alberta Premier Rachel Notley for having gender-balanced cabinets. “I honestly would be in trouble if someone asked me to name all the women in their cabinet and I would have trouble even trying to remember five,” she said. “I quite frankly find it humiliating and I find it patronizing that we as women can’t do it. And we can do it on our own and by ourselves.”

Self-hating conservative women? Yep, we’ve got that. We’ve also imported blatant hypocrisy; I can just imagine the reporter smiling as they tacked this bit on:

Forsyth’s message was in stark contrast to interim Conservative Leader Rona Ambrose, who used her speech to announce a new initiative to overcome the barriers Forsyth dismissed. The non-profit, which also involves Laureen Harper, will encourage and mentor women who want to run for the UCP in next year’s election. …

Ambrose said UCP Leader Jason Kenney has made it a priority to attract women and LGBTQ candidates to the party and meets constantly with future prospects. “I’m here to push that message forward,” Ambrose said.

She acknowledged that harassment on social media is one barrier women face. She said female politicians should have staff monitor feeds on Facebook and Twitter to create a buffer.  Ambrose said Twitter, in particular, is “a sewer for women.” “They need to make a lot of changes before it is a safe place for women,” she said.

What’s depressing is that Alberta has a liberal-ish government, with the notable exception of oil pipelines. The NDP have done a great job since coming to power, but their election was due to a divided and squabbling opposition. Now that conservatives in the province have united under the United Conservative Party, however, they’re likely to regain control. This is terrifying, because they “united” by essentially rolling over and handing the keys to the social conservatives. Abortion services are back to being controversial, their leader is fine with harming LGBTQ youth and exploiting them to whip up the base. He’s also opposed to environmental regulations. A party member who fired a woman for filing a sexual harassment complaint is still in charge of Democracy and Accountability. Hell, their shadiness even extends to their own voting procedures.

Maybe it’s time I paid more attention. I bet the NDP could use some volunteers next year


Speaking of the devil, and Twitter shall appear:

Kathleen Smith: UCP just passed a resolution to out teens who join a GSA. And they wonder why won’t let them march under their party banner in the parade.

Marni Panas: Every progressive who is still part of this party should be ashamed of themselves. Any member of the LGBTQ community who is part of this party should take a long look in the mirror. Disgusting resolution.

Marni Panas: The and pay more attention to anti-lgbtq activist John Carpay than the actual kids who will be harmed by being outed in GSA. I’m sick.

: Holy crap. This is what’s happening now at the . If you’ve lost track, they’ve gone from attack teens to attack & teachers to attack First Nation’s persons in a matter of minutes.

Marni Panas: This is what happens when racists and homophobes are emboldened by leaders like Trump and Kenney. I no longer want to hear “this would never happen in Canada. We’re better than that.” It’s happening right now in a hotel in Red Deer, Alberta.

This is all happening as the UCP is explicitly doing outreach to the LGBT community. They’re a bunch of two-faced bigots.

The Two Cultures, as per C. P. Snow

I’d never heard of C.P. Snow until Steven Pinker brought him up, but apparently he’s quite the deal. Much of it stems from a lecture Snow gave nearly sixty years ago. It’s been discussed and debated (funny meeting you here, Lawrence Krauss) to the point that I, several generations and one ocean away, can grab a reprint of the original with an intro about as long as the lecture itself.

Snow’s core idea is this: two types of intellectuals, scientists and elite authors, don’t talk with one another and are largely ignorant of each other’s work. His quote about elite authors being ignorant of physics is plastered everywhere, so I’d like to instead repeat what he said about scientists being ignorant of literature:

As one would expect, some of the very best scientists had and have plenty of energy and interest to spare, and we came across several who had read everything that literary people talk about. But that’s very rare. Most of the rest, when one tried to probe for what books they had read, would modestly confess “Well, I’ve tried a bit of Dickens”, rather as though Dickens were an extraordinarily esoteric, tangled and dubiously rewarding writer, something like Ranier Maria Rilke. In fact that is exactly how they do regard him: we thought that discovery, that Dickens had been transformed into the type-specimen of literary incomprehensibility, was one of the oddest results of the whole exercise. […]

Remember, these are very intelligent men. Their culture is in many ways an exacting and admirable one. It doesn’t contain much art, with the exception, and important exception, of music. Verbal exchange, insistent argument. Long-playing records. Colour photography. The ear, to some extent the eye. Books, very little, though perhaps not many would go so far as one hero, who perhaps I should admit was further down the scientific ladder than the people I’ve been talking about – who, when asked what books he read, replied firmly and confidently: “Books? I prefer to use my books as tools.” It was very hard not to let the mind wander – what sort of tool would a book make? Perhaps a hammer? A primitive digging instrument?

[Snow, Charles P. “The two cultures.” (1959): pg. 6-7]

To be honest, I have a hard time comprehending why the argument exists. If I were to transpose it to my place and time, it would be like complaining that Margaret Atwood, Alice Munro, and Michael Ondaatje are shockingly ignorant of basic physics, while if you were to quiz famous Canadian scientists about Canadian literature you’d eventually drag out a few mentions of Farley Mowat. I… don’t see the problem? Yes, it would be great if more people knew more things, but if you want to push the frontiers of knowledge you’ve got to focus on the specifics. Given that your time is (likely) finite, that means sacrificing some general knowledge. It would be quite ridiculous to ask someone in one speciality to explain something specific to another.

If we forget the scientific culture, then the rest of western intellectuals have never tried, wanted, or been able to understand the industrial revolution, much less accept it. Intellectuals, in particular literary intellectuals, are natural Luddites. [pg. 11-12]

The academics had nothing to do with the industrial revolution; as Corrie, the old Master of Jesus, said about trains running into Cambridge on Sunday, `It is equally displeasing to God and to myself’. So far as there was any thinking in nineteenth-century industry, it was left to cranks and clever workmen. American social historians have told me that much the same was true of the
U.S. The industrial revolution, which began developing in New England fifty years or so later than ours, apparently received very little educated talent, either then or later in the nineteenth century. [pg. 12]

… do we understand how they have happened? Have we begun to comprehend even the old industrial revolution? Much less the new scientific revolution in which we stand? There never was any thing more necessary to comprehend. [pg. 14]

Yep, that’s Snow trashing authors of high fiction for not having an understanding of the Industrial Revolution. It’s not an isolated case, either; Snow also criticises Cambridge art graduates for not being aware of “the human organisation” behind buttons [pg. 15]. He might as well have spent several paragraphs yelling at physicists for being unable to explain why Houlden Caulfield wanted to be a gas station attendant, he’s that far from reality.

Which gets us to the real consequences of Snow’s divide, and how he proposes heading them off at the pass.

To say we have to educate ourselves or perish, is a little more melodramatic than the facts warrant. To say, we have to educate ourselves or watch a steep decline in our own lifetime, is about right. We can’t do it, I am now convinced, without breaking the existing pattern. I know how difficult this is. It goes against the emotional grain of nearly all of us. In many ways, it goes against my own, standing uneasily with one foot in a dead or dying world and the other in a world that at all costs we must see born. I wish I could be certain that we shall have the courage of what our minds tell us. [pg. 20]

This disparity between the rich and the poor has been noticed. It has been noticed, most acutely and not unnaturally, by the poor. Just because they have noticed it, it won’t last for long. Whatever else in the world we know survives to the year 2000, that won’t. Once the trick of getting rich is known, as it now is, the world can’t survive half rich and half poor. It’s just not on.

The West has got to help in this transformation. The trouble is, the West with its divided culture finds it hard to grasp just how big, and above all just how fast, the transformation must be. [pg. 21-22]

So we need to educate scientists about the works of elite authors, and those authors about the work of scientists… because otherwise Britain will become impoverished, and/or we’d end poverty faster?! That doesn’t square up with the data. Let’s look at what the government of Namibia, a well-off African country, thinks will help end poverty.

  • Improving access to Community Skills Development Centres (Cosdecs) in remote areas and aligning the curriculum with that of the Vocational Training Centres.
  • To improve career options and full integration into the modern economy, there is need to introduce vocational subjects at upper primary and junior secondary levels. This will facilitate access to vocational education and labour market readiness by the youth.
  • Improving productivity of the subsistence agriculture by encouraging the use of both traditional and modern fertiliser and by providing information on modern farming methods.
  • The dismantling of the “Red Line” seems to hold some promise for livestock farmers in the North who were previously prevented access to markets outside of the northern regions.
  • Consider establishing a third economic hub for Namibia to relief Khomas and Erongo from migration pressure. With abundant water resources, a fertile land and being along the Trans Zambezi Corridor, Kavango East is a good candidate for an agricultural capital and a logistic growth point.
  • Given persistent drop-out rates especially in remote rural areas, there is need for increased access to secondary education by addressing both the distance and the quality of education.
  • Educate youth on the danger of adolescence pregnancy both in terms of exclusion from the modern economy and health implications.
  • Given the established relationship between access to services, poverty and economic inclusion, there is need for government to strive towards a regional balanced provision of access to safe drinking water, sanitation, electricity and housing. [pg. 58-59]

I don’t see any references to science in there, nor any to Neshani Andreas or Joseph Diescho. Britain’s the same story. But who knows, maybe an author/chemist who thought world poverty would end by the year 2000 has a better understanding of poverty than government agencies and century-old NGOs tasked with improving social conditions.

There’s a greater problem here, too. Let’s detour to something Donald Trump said:

Trump: “The Democrats don’t care about our military. They don’t.” He says that is also true of the border and crime

How would we prove that Democrats don’t care about the military, the US border, or crime? The easiest approach would be to look at their national platform and see it those things are listed there (they’re not, I checked). A much harder one would be to parse their actions instead. If we can find a single Democrat who does care about crime, then we’ve refuted the claim in the deductive sense.

But there’s still an inductive way to keep it alive: if “enough” Democrats don’t care about those things, then Trump can argue he meant the statements informally and thus it’s still true-ish. That’s a helluva lot of work, and since the burden is on the person making the claim it’s not my job to run around gathering data for Trump’s argument. If I’m sympathetic to Trump’s views or pride myself in being intellectually “fair,” however, there’s a good chance I’d do some of his homework anyway.

Lurking behind all of the logical stuff, however, is an emotional component. The US-Mexico border, the military, and crime all stir strong emotions in his audience; by positioning his opponents as being opposed to “positive” things, at the same time implying that he’s in favour of them, Trump’s angering his audience and motivating them to being less charitable towards his opponents.

That’s the language of hate: emotionally charged false statements about a minority, to be glib. It’s all the more reason to be careful when talking about groups.

The non-scientists have a rooted impression that the scientists are shallowly optimistic, unaware of man’s condition. On the other hand, the scientists believe that the literary intellectuals are totally lacking in foresight, peculiarly unconcerned with their brother men,
in a deep sense anti-intellectual, anxious to restrict both art and thought to the existential moment. And so on. Anyone with a mild talent for invective could produce plenty of this kind of subterranean back-chat. [pg. 3]

If you side with either scientists or elite authors, this is emotionally charged language. At the same time, I have no idea how you’d even begin to prove half of that. Snow’s defence consists of quoting Adam Rutherford and T.S. Elliot, all the rest comes from his experiences with “intimate friends among both scientists and writers” and “living among these groups and much more.” [pg. 1] Nonetheless, that small sample set is enough for Snow to assert “this is a problem of the entire West.” [pg. 2] Calling scientists or elite authors a minority is a stretch, but the net result is similar: increased polarisation between the two groups, and the promotion of harmful myths.

Yes, Snow would go on propose a “third culture” which would bridge the gap, but if the gap doesn’t exist in the first place this amounts to selling you a cure after convincing you you’re sick.

What’s worse is that if you’re operating in a fact-deficient environment, you’ve got tremendous flexibility to tweak things to your liking. Is J.K. Rowling a “literary intellectual?” She doesn’t fit into the highbrow culture Snow was talking about, but she is a well-known and influential author who isn’t afraid to let her opinions be known (for better or worse). Doesn’t that make her a decision maker, worthy of inclusion? And if we’ve opened the door for non-elite authors, why not add other people from the humanities? Or social scientists?

This also means that one of the harshest critics of C. P. Snow is C. P. Snow.

I have been argued with by non-scientists of strong down-to-earth interests. Their view is that it is an over-simplification, and that
if one is going to talk in these terms there ought to be at least three cultures. They argue that, though they are not scientists themselves, they would share a good deal of the scientific feeling. They would have as little use-perhaps, since they knew more about it, even less use-for the recent literary culture as the scientists themselves. …

I respect those arguments. The number 2 is a very dangerous number: that is why the dialectic is a dangerous process. Attempts to divide anything into two ought to be regarded with much suspicion. I have thought a long time about going in for further refinements: but in the end I have decided against. I was searching for something a little more than a dashing metaphor, a good deal less than a cultural map: and for those purposes the two cultures is about right, and subtilising any more would bring more disadvantages than it’s worth. [pg. 5]

He’s aware that some people regard “the two cultures” as an oversimplification, he recognises the problem with dividing people in two, and his response amounts to “well, I’m still right.” He’s working with such a deficiency of facts that he can undercut his own arguments and still keep making them as if no counter-argument existed.

I think it is only fair to say that most pure scientists have themselves been devastatingly ignorant of productive industry, and many still are. It is permissible to lump pure and applied scientists into the same scientific culture, but the gaps are wide. Pure scientists and engineers often totally misunderstand each other. Their behaviour tends to be very different: engineers have to live their lives in an organised community, and however odd they are underneath they manage to present a disciplined face to the world. Not so pure scientists. [pg. 16]

Snow makes a strong case for a third culture here, something he earlier said “would bring more disadvantages than it’s worth!” He’s seeing gaps and division everywhere, and defining things so narrowly that he can rattle off five counter-examples then immediately dismiss them (emphasis mine).

Almost everywhere, though, intellectual persons didn’t comprehend what was happening. Certainly the writers didn’t. Plenty of them shuddered away, as though the right course for a man of feeling was to contract out; some, like Ruskin and William Morris and Thoreau and Emerson and Lawrence, tried various kinds of fancies which were not in effect more than screams of horror. It
is hard to think of a writer of high class who really stretched his imaginative sympathy, who could see at once the hideous back-streets, the smoking chimneys, the internal price—and also the prospects of life that were opening out for the poor, the intimations, up to now unknown except to the lucky, which were just coming within reach of the remaining 99.0 per cent of his brother men.

Snow himself mentions Charles Dickens earlier in the lecture, a perfect fit for the label of “a writer of high class who really stretched his imaginative sympathy.” And yet here, he has difficulty remembering that author’s existence.

It’s oddly reminiscent of modern conservative writing: long-winded, self-important, and with only a fleeting connection to the facts. No wonder his ideas keep getting resurrected by them, they can be warped and distorted to suit your current needs.

NO COLLUSION, right?

Trump’s been a broken record about collision with the Kremlin. A small sampling from the last month:

It was a great report, no collusion, which I knew anyway, no coordination, no nothing. It’s a witch hunt, that’s all it is. There was no collusion with Russia, you can believe this one. She (Merkel) probably can’t believe it, who can? But the report was very powerful, very strong, there was no collusion between the Trump campaign and the Russian people. Cause I’ve said many times before, I’ve always said there was no collusion, but I’ve also said there has been nobody tougher on Russia than me.

Jennifer, I can say this — that there was no collusion, and that’s been so found, as you know, by the House Intelligence Committee. There’s no collusion. There was no collusion with Russia, other than by the Democrats — or, as I call them, the obstuctionists, because they truly are obstructionists.

James Comey Memos just out and show clearly that there was NO COLLUSION and NO OBSTRUCTION. Also, he leaked classified information. WOW! Will the Witch Hunt continue?

Much of the bad blood with Russia is caused by the Fake & Corrupt Russia Investigation, headed up by the all Democrat loyalists, or people that worked for Obama. Mueller is most conflicted of all (except Rosenstein who signed FISA & Comey letter). No Collusion, so they go crazy!

We get it, we get it, NO COLLUSION. Fine. Then I suppose Trump would have no problem answering these questions:

  1. When did you become aware of the Trump Tower meeting?
  2. What involvement did you have in the communication strategy [about that meeting], including the release of Donald Trump Jr.’s emails?
  3. During a 2013 trip to Russia, what communication and relationships did you have with the Agalarovs and Russian government officials?
  4. What communication did you have with Michael D. Cohen, Felix Sater and others, including foreign nationals, about Russian real estate developments during the campaign?
  5. What discussions did you have during the campaign regarding any meeting with Mr. Putin? Did you discuss it with others?
  6. What discussions did you have during the campaign regarding Russian sanctions?
  7. What involvement did you have concerning [the] platform changes regarding arming Ukraine [during the 2016 RNC convention]?
  8. What knowledge did you have of any outreach by your campaign, including by Paul Manafort, to Russia about potential assistance to the campaign?
  9. What did you know about communication between Roger Stone, his associates, Julian Assange or WikiLeaks?
  10. What did you know during the transition about an attempt to establish back-channel communication to Russia, and Jared Kushner’s efforts?
  11. What do you know about a 2017 meeting in Seychelles involving Erik Prince?
  12. What do you know about a Ukrainian peace proposal provided to Mr. Cohen in 2017?

Because according to the New York Times, those and many others are on the Special Council’s wish list. That should be easy enough, after all there’s NO COLLUSION and Robert Mueller’s long since requested to interview-

… oh wait, Trump is refusing to sit down with Mueller’s team and answer their questions? Strange. Come to think, why do we have these questions at all? The Special Council investigation is airtight, the only reason we have any information on what they’re doing is that the people they interview keep blabbing to the press. Trump’s “team” of lawyers would be the only people privy to those questions, other than Trump, and there’s no way they’d leak something like this. Oh well, I guess it’ll be up to the press to ask-

… ooohhh. I see what you did there, Mueller.


[HJH 2018-05-01] … or maybe not?

So disgraceful that the questions concerning the Russian Witch Hunt were “leaked” to the media. No questions on Collusion. Oh, I see…you have a made up, phony crime, Collusion, that never existed, and an investigation begun with illegally leaked classified information. Nice!

That’s Trump, using “leaked” in scare quotes. And apparently looking at an entirely different list of questions than I am. He’s smart enough to realize that collusion is not a crime, yet unaware that foreign contributions to US elections are a crime and that the US president can be impeached for non-crimes.

But if Trump’s advisers can spin that infamous Fox and Friends interview into a positive, maybe they thought slipping this list to reporters was another win for Trump. Somehow.


Kudos to Matthew Rozsa over at Salon for filling in some of the “somehow.”

[Margaret] Hartmann also noted that there were three prevailing theories as to why the questions were leaked (and most likely by someone either currently or formerly associated with Trump’s team): to convince Trump not to speak with Mueller, to turn the public against the Mueller investigation or to persuade the Republican-controlled Congress that Mueller is getting too close to the president and needs to be stopped. […]

[Norm Eison:] I helped witnesses decide whether to talk to prosecutors for decades. Now that we know Mueller’s questions, I believe that Trump is unlikely ever to answer them unless subpoenaed–and then his answer will be “I take the Fifth.” No wonder Dowd quit. […]

“The very fact that the questions are out there, my first reaction is that it could be an act of obstruction just to have released these questions,” John Dean, the former White House counsel to President Richard Nixon, told Anderson Cooper. He elaborated that they may have been released “to try to somehow disrupt the flow of information, the tipping off of a witness in advance as to what the questions are going to be.”

Rozsa also has a good summary of the evidence that Trump’s team leaked the questions.


[HJH 2018-05-02] OK, I think we’ve finally cracked this one.

In the wake of the testy March 5 meeting, Mueller’s team agreed to provide the president’s lawyers with more specific information about the subjects that prosecutors wished to discuss with the president. With those details in hand, Trump lawyer Jay Sekulow compiled a list of 49 questions that the team believed the president would be asked, according to three of the four people, who spoke on the condition of anonymity because they were not authorized to talk publicly. […]

After investigators laid out 16 specific subjects they wanted to review with the president and added a few topics within each one, Sekulow broke the queries down into 49 separate questions, according to people familiar with the process. […]

For his part, Trump fumed when he saw the breadth of the questions that emerged out of the talks with Mueller’s team, according to two White House officials. The president and several advisers now plan to point to the list as evidence that Mueller has strayed beyond his mandate and is overreaching, they said.

So this was just another front in the grand plan to discredit, distract, or damage the Special Council investigation by the current President of the United States and high-ranking members of the Republican Party. It’s gotten so bad even Trump’s supporters assume he’s guilty of something. I don’t think it worked, which shouldn’t be a surprise given the calibre of the talent surrounding Trump. The only person more blind is Trump himself.

Feeling the Research

Daryl Bem must be sick of those puns by now.

Back in 2011 he published Feeling the Future, a paper that combined multiple experiments on human precognition to argue it was a thing. Naturally this led to a flurry of replications, many of which riffed on his original title. I got interested via a series of blog posts I wrote that, rather surprisingly, used what he published to conclude precognition doesn’t exist.

I haven’t been Bem’s only critic, and one that’s a lot higher profile than I has extensively engaged with him both publicly and privately. In the process, they published Bem’s raw data. For months, I’ve wanted to revisit that series with this new bit of data, but I’m realising as I type this that it shouldn’t live in that Bayes 20x series. I don’t need to introduce any new statistical tools to do this analysis, for starters; all the new content here relates to the dataset itself. To make understanding that easier, I’ve taken the original Excel files and tossed them into a Google spreadsheet. I’ve re-organized the sheets in order of when the experiment was done, added some new columns for numeric analysis, and popped a few annotations in.

Odd Data

The first thing I noticed was that the experiments were not presented in the order they were actually conducted. It looks like he re-organized the studies to make a better narrative for the paper, implying he had a grand plan when in fact he was switching between experimental designs. This doesn’t affect the science, though, and while never stating the exact order Bem hints at this reordering on pages three and nine of Feeling the Future.

What may affect the science are the odd timings present within many of the datasets. As Dr. R pointed out in an earlier link, Bem combined two 50-sample studies together for the fifth experiment in his paper, and three studies of 91, 19, and 40 students for the sixth. Pasting together studies like that is a problem within frequentist statistics, due to the “stopping problem.” Stopping early is bad, because random fluctuations may blow the p-value across the “statistically significant” line when additional data would have revealed a non-significant result; but stopping too late is also bad, because p-values tend to exaggerate the evidence against the null hypothesis and the problem gets worse the more data you add.

But when pouring over the datasets, I noticed additional gaps and oddities that Dr. R missed. Each dataset has a timestamp for when subjects took the test, presumably generated by the hardware or software. These subjects were undergrad students at a college, and grad students likely administered some or all the tests. So we’d expect subject timestamps to be largely Monday to Friday affairs in a continuous block. Since these are machine generated or copy-pasted from machine-generated logs, we should see a monotonous increase.

Yet that 91 study which makes up part of the sixth study has a three-month gap after subject #50. Presumably the summer break prevented Bem from finding subjects, but what sort of study runs for a month, stops for three, then carries on for one more? On the other hand, that logic rules out all forms of replication. If the experimental parameters and procedure did not change over that time-span, either by the researcher’s hand or due to external events, there’s no reason to think the later subjects differ from the former.

Look more carefully and you see that up until subject #49 there were several subjects per day, followed by a near two-week pause until subject #50 arrived. It looks an awful like Bem was aiming for fifty subjects during that time, was content when he reached fourty-nine, then luck and/or a desire for even numbers made him add number fifty. If Bem was really aiming for at least 100 subjects, as he claimed in a footnote on page three of his paper, he could have easily added more than fifty, paused the study, and resumed in the fall semester. Most likely, he was aiming for a study of fifty subjects back then, suggesting the remaining forty-one were originally the start of a second study before later being merged.

Experiment 1, 2, 4, and 7 also show odd timestamps. Many of these can be explained by Spring Break or Thanksgiving holidays, but many also stop at round numbers. There’s also instances where some timestamps occur out-of-order or the sequence number reverses itself. This is pretty strong evidence of human tampering, though “tampering” isn’t the synonymous with “fraud;” any sufficiently large study will have mistakes, and any attempt to correct those mistakes will look like fraud. That still creates uncertainty in a dataset and necessarily lowers our trust in it.

I’ve also added stats for the individual runs, and some of them paint an interesting tale. Take experiment 2, for instance. As of the pause after subject #20, the success rate was 52.36%, but between subject #20 and #100 it was instead 51.04%. The remaining 50 subjects had a success rate of 52.39%, bringing the total rate up to 51.67%. Why did I place a division between those first hundred and last fifty? There’s no time-stamp gap there, and no sign of a parameter shift. Nonetheless, if we look at page five and six of the paper, we find:

For the first 100 sessions, the flashed positive and negative pictures were independently selected and sequenced randomly. For the subsequent 50 sessions, the negative pictures were put into a fixed sequence, ranging from those that had been successfully avoided most frequently during the first 100 sessions to those that had been avoided least frequently. If the participant selected the target, the positive picture was flashed subliminally as before, but the unexposed negative picture was retained for the next trial; if the participant selected the nontarget, the negative picture was flashed and the next positive and negative pictures in the queue were used for the next trial. In other words, no picture was exposed more than once, but a successfully avoided negative picture was retained over trials until it was eventually invoked by the participant and exposed subliminally. The working hypothesis behind this variation in the study was that the psi effect might be stronger if the most successfully avoided negative stimuli were used repeatedly until they were eventually invoked.

So precisely when Bem hit a round number and found the signal strength was getting weaker, he tweaked the parameters of the experiment? That’s sketchy, especially if he peeked at the data during the pause at subject #20. If he didn’t, the parameter tweak is easier to justify, as he’d already hit his goal of 100 subjects and had time left in the semester to experiment. Combining both experimental runs would still be a no-no, though.

Uncontrolled Controls

Bem’s inconsistent use of controls was present in the paper, but it’s a lot more obvious in the dataset. In experiments 2, 3, 4, and 7 there is no control group at all. That is dangerous. If you run a control group through a protocol nearly identical to that of the experimental group, and you don’t get a null result, you’ve got good evidence that the procedure is flawed. If you don’t run a control group, you’d better be damn sure your experimental procedure has been proven reliable in prior studies, and that you’re following the procedure close enough to prevent bias.

Bem doesn’t hit that for experiments 2 and 7; the latter isn’t the replication of a prior study he’s carried out, and while the former is a replication of experiment 1 the earlier study was carried out two years before and appears to have been two separate sample runs pasted together, each with different parameters. In experiments 3 and 4, Bem’s comparing something he knows will have an effect (forward priming) with something he hopes will have an effect (retroactive priming). There’s no explicit comparison of the known-effect’s size to that found in other studies, Bem’s write-up appears to settle for showing statistical significance. Merely showing there is an effect does not demonstrate that effect is of the same magnitude as expected.

Conversely, experiments 5 and 6 have a very large number of controls, relative to the experimental conditions. This is wasteful, certainly, but it could also throw off the analysis: since the confidence interval narrows as more samples are taken, we can tighten one side up by throwing more datapoints in and taking advantage of the p-value’s weakness.

Experiment 6 might show this in action. For the first fifty subjects, the control group was further from the null value than the negative image group, but not as extreme as the erotic image one. Three months later, the next fourty-one subjects are further from the null value than both the experimental groups, but this time in the opposite direction! Here, Bem drops the size of the experimental groups and increases the size of the control group; for the next nineteen subjects, the control group is again more extreme than the negative image group and again less extreme than the erotic group, plus the polarity has flipped again. For the last fourty subjects, Bem increased the sizes of all groups by 25%, but the control is again more extreme and the polarity has flipped yet once more. Nonetheless, adding all four runs together allows all that flopping to cancel out, and Bem to honestly write “On the neutral control trials, participants scored at chance level: 49.3%, t(149) = -0.66, p = .51, two-tailed.” This looks a lot like tweaking parameters on-the-fly to get a desired outcome.

It also shows there’s substantial noise in Bem’s instruments. What’s the odds that the negative image group success rate would show less variance than the control group, despite having anywhere from a third to a sixth of the sample size? How can their success rate show less variance than the erotic image group, despite having the same sample size? These scenarios aren’t impossible, but with them coming at a time when Bem was focused on precognition via negative images it’s all quite suspicious.

The Control Isn’t a Control

All too often, researchers using frequentist statistics get blinded by the way p-values ignore the null hypothesis, and don’t bother checking their control groups. Bem’s fairly good about this, but we can do better.

All of Bem’s experiments, save 3 and 4, rely on Bernoulli processes; every person has some probability of guessing the next binary choice correctly, due possibly to inherent precognitive ability, and that probability does not change with time. It follows that the distribution of successful guesses follows the binomial distribution, which can be written:

P( s `divides` p,f ) ~=~ { (s+f)"!" } over { s"!" f"!" } p^s ( 1-p )^f where s is the number of successes, f the number of failures, and p the odds of success; that means P ( s | p,f ) translates to “the probability of having s successes, given the odds of success are p and there were f failures.” Naturally, p must be between 0 and 1.

Let’s try a thought experiment: say you want to test if a single six-sided die is biased to come up 1. You roll it thirty-six times, and observe four instances where it comes up 1. Your friend tosses it seventy-two times, and spots fifteen instances of 1. You’d really like to pool your results together and get a better idea of how fair the die is; how would you do this? If you answered “just add all the successes together, as well as the failures,” you nailed it!The probability distribution of rolling a 1 for a given die, according to you and your friend's experiments.The results look pretty good; both you and your friend would have suspected the die was biased based on your individual rolls, but the combined distribution looks like what you’d expect from a fair die.

But my Bayes 208 post was on conjugate distributions, which defang a lot of the mathematical complexity that comes from Bayesian methods by allowing you to merge statistical distributions. Sit back and think about what just happened: both you and your friend examined the same Bernoulli process, resulting in two experiments and two different binomial distributions. When we combined both experiments, we got back another binomial distribution. The only way this differs from Bayesian conjugate distributions is the labeling; had I declared your binomial to be the prior, and your friend’s to be the likelihood, it’d be obvious the combination was the posterior distribution for the odds of rolling a 1.

Well, almost the only difference. Most sources don’t list the binomial distribution as the conjugate for this situation, but instead the Beta distribution:

Beta( p `divides` %alpha,%beta ) ~=~ { %GAMMA(%alpha + %beta) } over { %GAMMA(%alpha) %GAMMA(%beta) } p^{%alpha-1} ( 1-p )^{%beta-1}

But I think you can work out the two are almost identical, without any help from me. The only real advantage of the Beta distribution is that it allows non-integer successes and failures, thanks to the Gamma function, which in turn permits a nice selection of priors.

In theory, then, it’s dirt easy to do a Bayesian analysis of Bem’s handiwork: tally up the successes and failures from each individual experiment, add them together, and plunk them into a binomial distribution. In practice, there are three hurdles. The easy one is the choice of prior; fortunately, Bem’s datasets are large enough that they swamp any reasonable prior, so I’ll just use the Bayes-Laplace one and be done with it. A bigger one is that we’ve got at least three distinct Bernoulli processes in play: pressing a button to classify an image (experiments 3, 4), remembering a word from a list (8, 9), and guessing the next image out of a binary pair (everything else). If you’re trying to describe precognition and think it varies depending on the input image, then the negative image trials have to be separated from the erotic image ones. Still, this amounts to little more than being careful with the datasets and thinking hard about how a universal precognition would be expressed via those separate processes.

The toughest of the bunch: Bem didn’t record the number of successes and failures, save experiments 8 and 9. Instead, he either saved log timings (experiments 3 and 4) or the success rate, as a percentage of all trials. This is common within frequentist statistics, which is obsessed with maximal likelihoods, but it destroys information we could use to build a posterior distribution. Still, this omission isn’t fatal. We know the number of successes and failures are integer values. If we correctly guess their sum and multiply it by the rate, the result will be an integer; if we pick an incorrect sum, it’ll be a fraction. A complication arrives if there are common factors between the number of successes and the total trials, but there should some results which lack those factors. By comparing results to one another, we should be able to work out both what the underlying total was, as well as when that total changes, and in the process we learn the number of successes and can work backwards to the number of failures.

As the heading suggests, there’s something interesting hidden in the control groups. I’ll start with the binary image pair controls, which behave a lot like a coin flip; as the samples pile up, we’d expect the control distribution to migrate to the 50% line. When we do all the gathering, we find…

What happens when we combine the control groups for the binary image process from Bem (2011).… that’s not good. Experiment 1 had a great control group, but the controls from experiment 5 and 6 are oddly skewed. Since they had a lot more samples, they wind up dominating the posterior distribution and we find ourselves with fully 92.5% of the distribution below the expected value of p = 0.5. This sets up a bad precedent, because we now know that Bem’s methodology can create a skew of 0.67% away from 50%; for comparison, the combined signal from all studies was a skew of 0.83%. Are there bigger skews in the methodology of experiments 2, 3, 4, or 7? We’ve got no idea, because Bem never ran control groups.

Experiments 3 and 4 lack any sort of control, so we’re left to consider the strongest pair of experiments in Bem’s paper, 8 and 9. Bem used a Differential Recall score instead of the raw guess count, as it makes the null effect have an expected value of zero. This Bayesian analysis can cope with a non-zero null, so I’ll just use a conventional success/failure count.

Experiments 8 and 9 from Bem's 2011 paper.

On the surface, everything’s on the up-and-up. The controls have more datapoints between them than the treatment group, but there’s good and consistent separation between them and the treatment. Look very careful at the numbers on the bottom, though; the effects are in quite different places. That’s strange, given the second study only differs from the first via some extra practice (page 14); I can see that improving up the main control and treatment groups, but why does it also drag along the no-practice groups? Either there aren’t enough samples here to get rid of random noise, which seems unlikely, or the methodology changed enough to spoil the replication.

Come to think of it, one of those controls isn’t exactly a control. I’ll let Bem explain the difference.

Participants were first shown a set of words and given a free recall test of those words. They were then given a set of practice exercises on a randomly selected subset of those words. The psi hypothesis was that the practice exercises would retroactively facilitate the recall of those words, and, hence, participants would recall more of the to-be-practiced words than the unpracticed words. […]

Although no control group was needed to test the psi hypothesis in this experiment, we ran 25 control sessions in which the computer again randomly selected a 24-word practice set but did not actually administer the practice exercises. These control sessions were interspersed among the experimental sessions, and the experimenter was uninformed as to condition. [page 13]

So the “no-practice treatment,” as I dubbed it in the charts, is actually a test of precognition! It happens to be a lousy one, as without a round of post-hoc practice to prepare subjects their performance should be poor. Nonetheless, we’d expect it to be as good or better than the matching controls. So why, instead, was it consistently worse? And not just a little worse, either; for experiment 9, it was as worse from its control as the main control was from its treatment group.

What it all Means

I know, I seems to be a touch obsessed with one social science paper. The reason has less to do with the paper than the context around it: you can make a good argument that the current reproducibility crisis is thanks to Bem. Take the words of E.J. Wagenmakers et al.

Instead of revising our beliefs regarding psi, Bem’s research should instead cause us to revise our beliefs on methodology: The field of psychology currently uses methodological and statistical strategies that are too weak, too malleable, and offer far too many opportunities for researchers to befuddle themselves and their peers. […]

We realize that the above flaws are not unique to the experiments reported by Bem (2011). Indeed, many studies in experimental psychology suffer from the same mistakes. However, this state of affairs does not exonerate the Bem experiments. Instead, these experiments highlight the relative ease with which an inventive researcher can produce significant results even when the null hypothesis is true. This evidently poses a significant problem for the field and impedes progress on phenomena that are replicable and important.

Wagenmakers, Eric–Jan, et al. “Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011).” (2011): 426.

When it was pointed out Bayesian methods wiped away his results, Bem started doing Bayesian analysis. When others pointed out a meta-analysis could do the same, Bem did that too. You want open data? Bem was a hipster on that front, sharing his data around to interested researchers and now the public. He’s been pushing for replication, too, and in recent years has begun pre-registering studies to stem the garden of forking paths. Bem appears to be following the rules of science, to the letter.

I also know from bitter experience that any sufficiently large research project will run into data quality issues. But, now that I’ve looked at Bem’s raw data, I’m feeling hoodwinked. I expected a few isolated issues, but nothing on this scale. If Bem’s 2011 paper really is a type specimen for what’s wrong with the scientific method, as practiced, then it implies that most scientists are garbage at designing experiments and collecting data.

I’m not sure I can accept that.

How to Become a Radical

If I had a word of the week, it would be “radicalization.” Some of why the term is hot in my circles is due to offline conversations, some of it stems from yet another aggrieved white male engaging in terrorism, and some from yet another study confirms Trump voters were driven by bigotry (via fearing the loss of privilege that comes from giving up your superiority to promote equality).

Some just came in via Rebecca Watson, though, who pointed me to a fascinating study.

For example, a shift from ‘I’ to ‘We’ was found to reflect a change from an individual to a collective identity (…). Social status is also related to the extent to which first person pronouns are used in communication. Low-status individuals use ‘I’ more than high-status individuals (…), while high-status individuals use ‘we’ more often (…). This pattern is observed both in real life and on Internet forums (…). Hence, a shift from “I” to “we” may signal an individual’s identification with the group and a rise in status when becoming an accepted member of the group.

… I think you can guess what Step Two is. Walk away from the screen, find a pen and paper, write down your guess, then read the next paragraph.

The forum investigated here is one of the largest Internet forums in Sweden, called Flashback (…). The forum claims to work for freedom of speech. It has over one million users who, in total, write 15 000 to 20 000 posts every day. It is often criticized for being extreme, for example in being too lenient regarding drug related posts but also for being hostile in allowing denigrating posts toward groups such as immigrants, Jews, Romas, and feminists. The forum has many sub-forums and we investigate one of these, which focuses on immigration issues.

The total text data from the sub-forum consists of 964 Megabytes. The total amount of data includes 700,000 posts from 11th of July, 2004 until 25th of April, 2015.

How did you do? I don’t think you’ll need pen or paper to guess what these scientists saw in Step Three.

We expected and found changes in cues related to group identity formation and intergroup differentiation. Specifically, there was a significant decrease in the use of ‘I’ and a simultaneous increase in the use of ‘we’ and ‘they’. This has previously been related to group identity formation and differentiation to one or more outgroups (…). Increased usage of plural, and decreased frequency of singular, nouns have also been found in both normal, and extremist, group formations (…). There was a decrease in singular pronouns and a relative increase in collective pronouns. The increase in collective pronouns referred both to the ingroup (we) and to one or more outgroups (they). These results suggest a shift toward a collective identity among participants, and a stronger differentiation between the own group and the outgroup(s).

Brilliant! We’ve confirmed one way people become radicalized: by hanging around in forums devoted to “free speech,” the hate dumped on certain groups gradually creates an in-group/out-group dichotomy, bringing out the worst in us.

Unfortunately, there’s a problem with the staircase.

Categories Dictionaries Example words Mean r
Group differentiation First person singular I, my, me -.0103 ***
First person plural We, our, us .0115 ***
Third person plural They, them, their .0081 ***
Certainty Absolutely, sure .0016 NS

***p < .001. NS = not significant. n=11,751.

Table 2 tripped me up, hard. I dropped by the ever-awesome R<-Psychologist and cooked up two versions of the same dataset. One has no correlation, while the other has a correlation coefficient of 0.01. Can you tell me which is which, without resorting to a straight-edge or photo editor?

Comparing two datasets, one with r=0, the other with r=0.01.

I can’t either, because the effect size is waaaaaay too small to be perceptible. That’s a problem, because it can be trivially easy to manufacture a bias at least that large. If we were talking about a system with very tight constraints on its behaviour, like the Higgs Boson, then uncovering 500 bits of evidence over 2,500,000,000,000,000,000 trials could be too much for any bias to manufacture. But this study involves linguistics, which is far less precise than the Standard Model, so I need a solid demonstration of why this study is immune to biases on the scale of r = 0.01.

The authors do try to correct for how p-values exaggerate the evidence in large samples, but they do it by plucking p < 0.001 out of a hat. Not good enough; how does that p-value relate to studies of similar subject matter and methodology? Also, p-values stink. Also also, I notice there’s no control sample here. Do pro-social justice groups exhibit the same trend over time? What about the comment section of sports articles? It’s great that their hypotheses were supported by the data, don’t get me wrong, but it would be better if they’d tried harder to swat down their own hypothesis. I’d also like to point out that none of my complaints falsify their hypotheses, they merely demonstrate that the study falls well short of confirmed or significant, contrary to what I typed earlier.

Alas, I’ve discovered another path towards radicalization: perform honest research about the epistemology behind science. It’ll ruin your ability to read scientific papers, and leave you in despair about the current state of science.

Bayes Bunny iz trying to cool off after reading too many scientific papers.

The Laziness of Steven Pinker

I know, I know, I should have promoted that OrbitCon talk on Steven Pinker before it aired. I was a bit swamped developing material for it, ironically, most of which never made it to air. Don’t worry, I’ll be sharing the good bits via blog post. Amusingly, this first example isn’t from that material. I wound up reading a lot of Pinker, and developed a hunch I wasn’t able to track down before air time. In a stroke of luck, Siggy handed me the material I needed to properly follow up.

Enough suspense: what’s your opinion of self-plagiarism, or copying your own work without flagging what you’ve done?

… self-plagiarism does carry with it some level of dishonesty, at least in some situations. The problem is that, when an author, artist or other creator presents a new work, it’s generally expected to be all-new content, unless otherwise clearly stated. … with an academic paper, one is generally expected to showcase what they have learned most recently, meaning that self-plagiarism defeats the purpose of the paper or the assignment. On the other hand, in a creative environment, however, reusing old passages, especially in a limited manner, might be more about homage and maintaining consistency than plagiarism.

It’s a bit of a gray area, isn’t it? The US Office of Research Integrity declares it unethical, but also declares that self-plagiarism isn’t misconduct. Nonetheless it could be considered misconduct in an academic context, and the ORI themselves outline the case:

For example, in one editorial, Schein (2001) describes the results of a study he and a colleague carried out which found that 92 out of 660 studies taken from 3 major surgical journals were actual cases of redundant publication. The rate of duplication in the rest of the biomedical literature has been estimated to be between 10% to 20% (Jefferson, 1998), though one review of the literature suggests the more conservative figure of approximately 10% (Steneck, 2000). However, the true rate may depend on the discipline and even the journal and more recent studies in individual biomedical journals do show rates ranging from as low as just over 1% in one journal to as high as 28% in another (see Kim, Bae, Hahm, & Cho, 2014) The current situation has become serious enough that biomedical journal editors consider redundancy and duplication one of the top areas of concern (Wager, Fiack, Graf, Robinson, & Rowlands, 2009) and it is the second highest cause for articles to be retracted from the literature between the years 2007 and 2011 (Fang, Steen, & Casadevall, 2012).

But is it misconduct in the context of non-academic science writing? I’m not sure, but I think it’s fair to say self-plagiarism counts as lazy writing. Whatever the ethics, let’s examine an essay by Pinker that Edge published sometime before January 10th, 2017, and match it up against Chapter 2 of Enlightenment Now. I’ve checked the footnotes and preface of the latter, and failed to find any reference to that Edge essay, while the former does not say it’s excerpted from a forthcoming book. You’d have no idea one copy existed if you’d only read the other, so any matching passages count as self-plagiarism.

How many passages match? I’ll use the Edge essay as a base, and highlight exact duplicates in red, sections only present in Enlightenment Now in green, paraphrases in yellow, and essay-only text in black.

The Second Law of Thermodynamics states that in an isolated system (one that is not taking in energy), entropy never decreases. (The First Law is that energy is conserved; the Third, that a temperature of absolute zero is unreachable.) Closed systems inexorably become less structured, less organized, less able to accomplish interesting and useful outcomes, until they slide into an equilibrium of gray, tepid, homogeneous monotony and stay there.

In its original formulation the Second Law referred to the process in which usable energy in the form of a difference in temperature between two bodies is inevitably dissipated as heat flows from the warmer to the cooler body. (As the musical team Flanders & Swann explained, “You can’t pass heat from the cooler to the hotter; Try it if you like but you far better notter.”) A cup of coffee, unless it is placed on a plugged-in hot plate, will cool down. When the coal feeding a steam engine is used up, the cooled-off steam on one side of the piston can no longer budge it because the warmed-up steam and air on the other side are pushing back just as hard.

Once it was appreciated that heat is not an invisible fluid but the energy in moving molecules, and that a difference in temperature between two bodies consists of a difference in the average speeds of those molecules, a more general, statistical version of the concept of entropy and the Second Law took shape. Now order could be characterized in terms of the set of all microscopically distinct states of a system (in the original example involving heat, the possible speeds and positions of all the molecules in the two bodies). Of all these states, the ones that we find useful from a bird’s-eye view (such as one body being hotter than the other, which translates into the average speed of the molecules in one body being higher than the average speed in the other) make up a tiny sliver of the possibilities, while the disorderly or useless states (the ones without a temperature difference, in which the average speeds in the two bodies are the same) make up the vast majority. It follows that any perturbation of the system, whether it is a random jiggling of its parts or a whack from the outside, will, by the laws of probability, nudge the system toward disorder or uselessness —not because nature strives for disorder, but because there are so many more ways of being disorderly than of being orderly. If you walk away from a sand castle, it won’t be there tomorrow, because as the wind, waves, seagulls, and small children push the grains of sand around, they’re more likely to arrange them into one of the vast number of configurations that don’t look like a castle than into the tiny few that do. [Enlightenment Now adds five sentences here.]

 

I could (and have!) carried on, demonstrating that almost all of that essay reappears in Pinker’s book. Maybe half of the reappearance is verbatim. I figure he copy-pasted the contents of his January 2017 essay into the manuscript for his 2018 book, and expanded it to fill an entire chapter. Whether I’m right or wrong, I think the similarities make a damning case for intellectual laziness. It also sets up a bad precedent: if Pinker can get this lazy with his non-academic writing, how lazy can he be with his academic work? I haven’t looked into that, and I’m curious if anyone else has.

To A Burnt-Out Activist

The scandal brewing at the end of my post has come to pass. This one hurt a little bit; publicly  at least, Silverman seemed to be in favor of policies that would reduce sexual assault, and spoke out against the bigots in our movement. In reality, given the evidence, he was talking the talk but not walking the walk.

That comes on top of my growing unease over that last blog post. There’s nothing in there worth changing, that I’m aware of; the problem is more with what it doesn’t say, and who it mentions in passing but otherwise leaves at the margin.

See, there’s a pervasive belief that minorities are responsible for bringing about social justice, either by claiming they created the problem or demanding they educate everyone. That falls apart if you spend a half-second dwelling on it. The majority, by definition, hold most of the power in society. If they accepted the injustice done to the minority, they’d use that power to help resolve it. In reality, they tend to bury their heads in the sand, ignoring the evidence of injustice or finding ways to excuse it, so their power is often wielded against the minority. The result is that the minority has to spend an enormous amount of time and energy educating and agitating the majority.

So you can see why calling for people to fight harder for the change they’d like to see, as I did last blog post, can seem clueless and even heartless. Yes, I placed a few lines in there to hint that I was talking to the majority, but those have to be weighed against the context I outlined above. This time around, I’d rather focus on the burnt-out activist than the clueless white guy.

Put bluntly, life is short. You should spend your time doing things you find rewarding; endlessly quoting painful testimony of sexual assault, or the science and statistics of how tragically common it is, or giving an embarrassingly basic lecture on consent, doesn’t stay in that category for long. The resulting feelings of burnout or frustration are entirely valid, and worthy of taking seriously.

Human beings are also complex, we exist in many cultures and movements. I sometimes advocate for secularism, but I’ve also written about science, statistics, and even dabbled in art from time to time. If one aspect of my life becomes frustrating, I can easily switch to another, and there’s nothing wrong with that switch. This may seem like a betrayal; how can you leave your sisters behind as they carry on fighting the good fight?

But it’s extremely rare for a single person to change a culture; in practice, change comes via a sustained, coordinated effort from multiple people. At worst, the loss of one person may slow things down, and even that is debatable: there’s an unstated premise here that once you’ve dropped out of culture, you can’t come back. That should be obviously false (and if it isn’t, run). If you can return, though, then why not use the time away to recharge? You’ll get a helluva lot more done ducking out from time to time to fight burn-out, than you would if you stuck around when you don’t care to.

I have tremendous sympathy for the people who are sick of arguing against all the sexism, racism, ableism, and so on within the atheist and skeptic movements. Take as long a break as you need to, come back if or when you feel it’s time. There should be an empty seat waiting for you, and if there isn’t you’ll be in a better place to flip everyone the bird and create a new culture that gets this shit right.


(As a side-note, I found it amusing when I began working through the OrbitCon talks and heard Greta Christina laying out similar points. She has been a big influence on my views on activism for several years, so the overlap is less surprising in hindsight.)