The Two Cultures, as per C. P. Snow

I’d never heard of C.P. Snow until Steven Pinker brought him up, but apparently he’s quite the deal. Much of it stems from a lecture Snow gave nearly sixty years ago. It’s been discussed and debated (funny meeting you here, Lawrence Krauss) to the point that I, several generations and one ocean away, can grab a reprint of the original with an intro about as long as the lecture itself.

Snow’s core idea is this: two types of intellectuals, scientists and elite authors, don’t talk with one another and are largely ignorant of each other’s work. His quote about elite authors being ignorant of physics is plastered everywhere, so I’d like to instead repeat what he said about scientists being ignorant of literature:

As one would expect, some of the very best scientists had and have plenty of energy and interest to spare, and we came across several who had read everything that literary people talk about. But that’s very rare. Most of the rest, when one tried to probe for what books they had read, would modestly confess “Well, I’ve tried a bit of Dickens”, rather as though Dickens were an extraordinarily esoteric, tangled and dubiously rewarding writer, something like Ranier Maria Rilke. In fact that is exactly how they do regard him: we thought that discovery, that Dickens had been transformed into the type-specimen of literary incomprehensibility, was one of the oddest results of the whole exercise. […]

Remember, these are very intelligent men. Their culture is in many ways an exacting and admirable one. It doesn’t contain much art, with the exception, and important exception, of music. Verbal exchange, insistent argument. Long-playing records. Colour photography. The ear, to some extent the eye. Books, very little, though perhaps not many would go so far as one hero, who perhaps I should admit was further down the scientific ladder than the people I’ve been talking about – who, when asked what books he read, replied firmly and confidently: “Books? I prefer to use my books as tools.” It was very hard not to let the mind wander – what sort of tool would a book make? Perhaps a hammer? A primitive digging instrument?

[Snow, Charles P. “The two cultures.” (1959): pg. 6-7]

To be honest, I have a hard time comprehending why the argument exists. If I were to transpose it to my place and time, it would be like complaining that Margaret Atwood, Alice Munro, and Michael Ondaatje are shockingly ignorant of basic physics, while if you were to quiz famous Canadian scientists about Canadian literature you’d eventually drag out a few mentions of Farley Mowat. I… don’t see the problem? Yes, it would be great if more people knew more things, but if you want to push the frontiers of knowledge you’ve got to focus on the specifics. Given that your time is (likely) finite, that means sacrificing some general knowledge. It would be quite ridiculous to ask someone in one speciality to explain something specific to another.

If we forget the scientific culture, then the rest of western intellectuals have never tried, wanted, or been able to understand the industrial revolution, much less accept it. Intellectuals, in particular literary intellectuals, are natural Luddites. [pg. 11-12]

The academics had nothing to do with the industrial revolution; as Corrie, the old Master of Jesus, said about trains running into Cambridge on Sunday, `It is equally displeasing to God and to myself’. So far as there was any thinking in nineteenth-century industry, it was left to cranks and clever workmen. American social historians have told me that much the same was true of the
U.S. The industrial revolution, which began developing in New England fifty years or so later than ours, apparently received very little educated talent, either then or later in the nineteenth century. [pg. 12]

… do we understand how they have happened? Have we begun to comprehend even the old industrial revolution? Much less the new scientific revolution in which we stand? There never was any thing more necessary to comprehend. [pg. 14]

Yep, that’s Snow trashing authors of high fiction for not having an understanding of the Industrial Revolution. It’s not an isolated case, either; Snow also criticises Cambridge art graduates for not being aware of “the human organisation” behind buttons [pg. 15]. He might as well have spent several paragraphs yelling at physicists for being unable to explain why Houlden Caulfield wanted to be a gas station attendant, he’s that far from reality.

Which gets us to the real consequences of Snow’s divide, and how he proposes heading them off at the pass.

To say we have to educate ourselves or perish, is a little more melodramatic than the facts warrant. To say, we have to educate ourselves or watch a steep decline in our own lifetime, is about right. We can’t do it, I am now convinced, without breaking the existing pattern. I know how difficult this is. It goes against the emotional grain of nearly all of us. In many ways, it goes against my own, standing uneasily with one foot in a dead or dying world and the other in a world that at all costs we must see born. I wish I could be certain that we shall have the courage of what our minds tell us. [pg. 20]

This disparity between the rich and the poor has been noticed. It has been noticed, most acutely and not unnaturally, by the poor. Just because they have noticed it, it won’t last for long. Whatever else in the world we know survives to the year 2000, that won’t. Once the trick of getting rich is known, as it now is, the world can’t survive half rich and half poor. It’s just not on.

The West has got to help in this transformation. The trouble is, the West with its divided culture finds it hard to grasp just how big, and above all just how fast, the transformation must be. [pg. 21-22]

So we need to educate scientists about the works of elite authors, and those authors about the work of scientists… because otherwise Britain will become impoverished, and/or we’d end poverty faster?! That doesn’t square up with the data. Let’s look at what the government of Namibia, a well-off African country, thinks will help end poverty.

  • Improving access to Community Skills Development Centres (Cosdecs) in remote areas and aligning the curriculum with that of the Vocational Training Centres.
  • To improve career options and full integration into the modern economy, there is need to introduce vocational subjects at upper primary and junior secondary levels. This will facilitate access to vocational education and labour market readiness by the youth.
  • Improving productivity of the subsistence agriculture by encouraging the use of both traditional and modern fertiliser and by providing information on modern farming methods.
  • The dismantling of the “Red Line” seems to hold some promise for livestock farmers in the North who were previously prevented access to markets outside of the northern regions.
  • Consider establishing a third economic hub for Namibia to relief Khomas and Erongo from migration pressure. With abundant water resources, a fertile land and being along the Trans Zambezi Corridor, Kavango East is a good candidate for an agricultural capital and a logistic growth point.
  • Given persistent drop-out rates especially in remote rural areas, there is need for increased access to secondary education by addressing both the distance and the quality of education.
  • Educate youth on the danger of adolescence pregnancy both in terms of exclusion from the modern economy and health implications.
  • Given the established relationship between access to services, poverty and economic inclusion, there is need for government to strive towards a regional balanced provision of access to safe drinking water, sanitation, electricity and housing. [pg. 58-59]

I don’t see any references to science in there, nor any to Neshani Andreas or Joseph Diescho. Britain’s the same story. But who knows, maybe an author/chemist who thought world poverty would end by the year 2000 has a better understanding of poverty than government agencies and century-old NGOs tasked with improving social conditions.

There’s a greater problem here, too. Let’s detour to something Donald Trump said:

Trump: “The Democrats don’t care about our military. They don’t.” He says that is also true of the border and crime

How would we prove that Democrats don’t care about the military, the US border, or crime? The easiest approach would be to look at their national platform and see it those things are listed there (they’re not, I checked). A much harder one would be to parse their actions instead. If we can find a single Democrat who does care about crime, then we’ve refuted the claim in the deductive sense.

But there’s still an inductive way to keep it alive: if “enough” Democrats don’t care about those things, then Trump can argue he meant the statements informally and thus it’s still true-ish. That’s a helluva lot of work, and since the burden is on the person making the claim it’s not my job to run around gathering data for Trump’s argument. If I’m sympathetic to Trump’s views or pride myself in being intellectually “fair,” however, there’s a good chance I’d do some of his homework anyway.

Lurking behind all of the logical stuff, however, is an emotional component. The US-Mexico border, the military, and crime all stir strong emotions in his audience; by positioning his opponents as being opposed to “positive” things, at the same time implying that he’s in favour of them, Trump’s angering his audience and motivating them to being less charitable towards his opponents.

That’s the language of hate: emotionally charged false statements about a minority, to be glib. It’s all the more reason to be careful when talking about groups.

The non-scientists have a rooted impression that the scientists are shallowly optimistic, unaware of man’s condition. On the other hand, the scientists believe that the literary intellectuals are totally lacking in foresight, peculiarly unconcerned with their brother men,
in a deep sense anti-intellectual, anxious to restrict both art and thought to the existential moment. And so on. Anyone with a mild talent for invective could produce plenty of this kind of subterranean back-chat. [pg. 3]

If you side with either scientists or elite authors, this is emotionally charged language. At the same time, I have no idea how you’d even begin to prove half of that. Snow’s defence consists of quoting Adam Rutherford and T.S. Elliot, all the rest comes from his experiences with “intimate friends among both scientists and writers” and “living among these groups and much more.” [pg. 1] Nonetheless, that small sample set is enough for Snow to assert “this is a problem of the entire West.” [pg. 2] Calling scientists or elite authors a minority is a stretch, but the net result is similar: increased polarisation between the two groups, and the promotion of harmful myths.

Yes, Snow would go on propose a “third culture” which would bridge the gap, but if the gap doesn’t exist in the first place this amounts to selling you a cure after convincing you you’re sick.

What’s worse is that if you’re operating in a fact-deficient environment, you’ve got tremendous flexibility to tweak things to your liking. Is J.K. Rowling a “literary intellectual?” She doesn’t fit into the highbrow culture Snow was talking about, but she is a well-known and influential author who isn’t afraid to let her opinions be known (for better or worse). Doesn’t that make her a decision maker, worthy of inclusion? And if we’ve opened the door for non-elite authors, why not add other people from the humanities? Or social scientists?

This also means that one of the harshest critics of C. P. Snow is C. P. Snow.

I have been argued with by non-scientists of strong down-to-earth interests. Their view is that it is an over-simplification, and that
if one is going to talk in these terms there ought to be at least three cultures. They argue that, though they are not scientists themselves, they would share a good deal of the scientific feeling. They would have as little use-perhaps, since they knew more about it, even less use-for the recent literary culture as the scientists themselves. …

I respect those arguments. The number 2 is a very dangerous number: that is why the dialectic is a dangerous process. Attempts to divide anything into two ought to be regarded with much suspicion. I have thought a long time about going in for further refinements: but in the end I have decided against. I was searching for something a little more than a dashing metaphor, a good deal less than a cultural map: and for those purposes the two cultures is about right, and subtilising any more would bring more disadvantages than it’s worth. [pg. 5]

He’s aware that some people regard “the two cultures” as an oversimplification, he recognises the problem with dividing people in two, and his response amounts to “well, I’m still right.” He’s working with such a deficiency of facts that he can undercut his own arguments and still keep making them as if no counter-argument existed.

I think it is only fair to say that most pure scientists have themselves been devastatingly ignorant of productive industry, and many still are. It is permissible to lump pure and applied scientists into the same scientific culture, but the gaps are wide. Pure scientists and engineers often totally misunderstand each other. Their behaviour tends to be very different: engineers have to live their lives in an organised community, and however odd they are underneath they manage to present a disciplined face to the world. Not so pure scientists. [pg. 16]

Snow makes a strong case for a third culture here, something he earlier said “would bring more disadvantages than it’s worth!” He’s seeing gaps and division everywhere, and defining things so narrowly that he can rattle off five counter-examples then immediately dismiss them (emphasis mine).

Almost everywhere, though, intellectual persons didn’t comprehend what was happening. Certainly the writers didn’t. Plenty of them shuddered away, as though the right course for a man of feeling was to contract out; some, like Ruskin and William Morris and Thoreau and Emerson and Lawrence, tried various kinds of fancies which were not in effect more than screams of horror. It
is hard to think of a writer of high class who really stretched his imaginative sympathy, who could see at once the hideous back-streets, the smoking chimneys, the internal price—and also the prospects of life that were opening out for the poor, the intimations, up to now unknown except to the lucky, which were just coming within reach of the remaining 99.0 per cent of his brother men.

Snow himself mentions Charles Dickens earlier in the lecture, a perfect fit for the label of “a writer of high class who really stretched his imaginative sympathy.” And yet here, he has difficulty remembering that author’s existence.

It’s oddly reminiscent of modern conservative writing: long-winded, self-important, and with only a fleeting connection to the facts. No wonder his ideas keep getting resurrected by them, they can be warped and distorted to suit your current needs.

Something for the Reading List

For nearly a decade, I have been researching and writing about women who dressed and lived as men and men who lived and dressed as women in the nineteenth-century American West. During that time, when people asked me about my work, my response was invariably met with a quizzical expression and then the inevitable question: “Were there really such people?” Newspapers document hundreds, in fact, and it is likely there were many more. Historians have been writing about cross-dressers for some time, and we know that such people have existed in all parts of the world and for about as long as we have recorded and remembered history.

Boag, Peter. “The Trouble with Cross-Dressers: Researching and Writing the History of Sexual and Gender Transgressiveness in the Nineteenth-Century American West.” Oregon Historical Quarterly 112, no. 3 (2011): 322–39. https://doi.org/10.5403/oregonhistq.112.3.0322.

Human beings have a really distorted view of history; we tend to project our experiences backward in time. Just recently introduced to the term “transgender?” Then transgender people must have only recently been invented, in the same way that bromances never existed before the term was added to the dictionary. Everyone is prone to this error, however, not just the bigots.

A central argument of my book is that many nineteenth-century western Americans who cross-dressed did so to express their transgender identity. Transgender is a term coined only during the last quarter of the twentieth century. It refers to people who identify with the gender (female or male) “opposite” of what society would typically assign to their bodies. I place “opposite” in quotation marks because the notion that female and male are somehow diametric to each other is a historical creation; scholars have shown, for example, that in the not-too-distant past, people in western civilization understood that there was only one sex and that male and female simply occupied different gradations on a single scale. That at one time the western world held to a one-sex or one-gender model, but later developed a two-sex or two-gender model, clearly shows that social conceptualization of gender, sex, and even sexuality changes over time. This reveals a problem that confronts historians: it is anachronistic to impose our present-day terms and concepts for and about gender and sexuality — such as transgender — onto the past.

In Re-Dressing America’s Frontier Past, I therefore strove to avoid the term transgender as much as possible. It is central to my study, however, to show that people in the nineteenth century had their own concepts and expressions for gender fluidity. By the end of the nineteenth century, for example, sexologists (medical doctors and scientists who study sex) had created the terms “sex invert” and “sexual inversion” to refer to people whose sexual desires and gender presentations (that is, the way they walked and talked, the clothing they wanted to wear, and so forth) did not, according to social views, conform to what their physiological sex should “naturally” dictate.

I wish I’d known about this book earlier, it would have made a cool citation. Oh well, either way it’s long since hit the shelves and been patiently waiting for a spot on your wishlist.

Feeling the Research

Daryl Bem must be sick of those puns by now.

Back in 2011 he published Feeling the Future, a paper that combined multiple experiments on human precognition to argue it was a thing. Naturally this led to a flurry of replications, many of which riffed on his original title. I got interested via a series of blog posts I wrote that, rather surprisingly, used what he published to conclude precognition doesn’t exist.

I haven’t been Bem’s only critic, and one that’s a lot higher profile than I has extensively engaged with him both publicly and privately. In the process, they published Bem’s raw data. For months, I’ve wanted to revisit that series with this new bit of data, but I’m realising as I type this that it shouldn’t live in that Bayes 20x series. I don’t need to introduce any new statistical tools to do this analysis, for starters; all the new content here relates to the dataset itself. To make understanding that easier, I’ve taken the original Excel files and tossed them into a Google spreadsheet. I’ve re-organized the sheets in order of when the experiment was done, added some new columns for numeric analysis, and popped a few annotations in.

Odd Data

The first thing I noticed was that the experiments were not presented in the order they were actually conducted. It looks like he re-organized the studies to make a better narrative for the paper, implying he had a grand plan when in fact he was switching between experimental designs. This doesn’t affect the science, though, and while never stating the exact order Bem hints at this reordering on pages three and nine of Feeling the Future.

What may affect the science are the odd timings present within many of the datasets. As Dr. R pointed out in an earlier link, Bem combined two 50-sample studies together for the fifth experiment in his paper, and three studies of 91, 19, and 40 students for the sixth. Pasting together studies like that is a problem within frequentist statistics, due to the “stopping problem.” Stopping early is bad, because random fluctuations may blow the p-value across the “statistically significant” line when additional data would have revealed a non-significant result; but stopping too late is also bad, because p-values tend to exaggerate the evidence against the null hypothesis and the problem gets worse the more data you add.

But when pouring over the datasets, I noticed additional gaps and oddities that Dr. R missed. Each dataset has a timestamp for when subjects took the test, presumably generated by the hardware or software. These subjects were undergrad students at a college, and grad students likely administered some or all the tests. So we’d expect subject timestamps to be largely Monday to Friday affairs in a continuous block. Since these are machine generated or copy-pasted from machine-generated logs, we should see a monotonous increase.

Yet that 91 study which makes up part of the sixth study has a three-month gap after subject #50. Presumably the summer break prevented Bem from finding subjects, but what sort of study runs for a month, stops for three, then carries on for one more? On the other hand, that logic rules out all forms of replication. If the experimental parameters and procedure did not change over that time-span, either by the researcher’s hand or due to external events, there’s no reason to think the later subjects differ from the former.

Look more carefully and you see that up until subject #49 there were several subjects per day, followed by a near two-week pause until subject #50 arrived. It looks an awful like Bem was aiming for fifty subjects during that time, was content when he reached fourty-nine, then luck and/or a desire for even numbers made him add number fifty. If Bem was really aiming for at least 100 subjects, as he claimed in a footnote on page three of his paper, he could have easily added more than fifty, paused the study, and resumed in the fall semester. Most likely, he was aiming for a study of fifty subjects back then, suggesting the remaining forty-one were originally the start of a second study before later being merged.

Experiment 1, 2, 4, and 7 also show odd timestamps. Many of these can be explained by Spring Break or Thanksgiving holidays, but many also stop at round numbers. There’s also instances where some timestamps occur out-of-order or the sequence number reverses itself. This is pretty strong evidence of human tampering, though “tampering” isn’t the synonymous with “fraud;” any sufficiently large study will have mistakes, and any attempt to correct those mistakes will look like fraud. That still creates uncertainty in a dataset and necessarily lowers our trust in it.

I’ve also added stats for the individual runs, and some of them paint an interesting tale. Take experiment 2, for instance. As of the pause after subject #20, the success rate was 52.36%, but between subject #20 and #100 it was instead 51.04%. The remaining 50 subjects had a success rate of 52.39%, bringing the total rate up to 51.67%. Why did I place a division between those first hundred and last fifty? There’s no time-stamp gap there, and no sign of a parameter shift. Nonetheless, if we look at page five and six of the paper, we find:

For the first 100 sessions, the flashed positive and negative pictures were independently selected and sequenced randomly. For the subsequent 50 sessions, the negative pictures were put into a fixed sequence, ranging from those that had been successfully avoided most frequently during the first 100 sessions to those that had been avoided least frequently. If the participant selected the target, the positive picture was flashed subliminally as before, but the unexposed negative picture was retained for the next trial; if the participant selected the nontarget, the negative picture was flashed and the next positive and negative pictures in the queue were used for the next trial. In other words, no picture was exposed more than once, but a successfully avoided negative picture was retained over trials until it was eventually invoked by the participant and exposed subliminally. The working hypothesis behind this variation in the study was that the psi effect might be stronger if the most successfully avoided negative stimuli were used repeatedly until they were eventually invoked.

So precisely when Bem hit a round number and found the signal strength was getting weaker, he tweaked the parameters of the experiment? That’s sketchy, especially if he peeked at the data during the pause at subject #20. If he didn’t, the parameter tweak is easier to justify, as he’d already hit his goal of 100 subjects and had time left in the semester to experiment. Combining both experimental runs would still be a no-no, though.

Uncontrolled Controls

Bem’s inconsistent use of controls was present in the paper, but it’s a lot more obvious in the dataset. In experiments 2, 3, 4, and 7 there is no control group at all. That is dangerous. If you run a control group through a protocol nearly identical to that of the experimental group, and you don’t get a null result, you’ve got good evidence that the procedure is flawed. If you don’t run a control group, you’d better be damn sure your experimental procedure has been proven reliable in prior studies, and that you’re following the procedure close enough to prevent bias.

Bem doesn’t hit that for experiments 2 and 7; the latter isn’t the replication of a prior study he’s carried out, and while the former is a replication of experiment 1 the earlier study was carried out two years before and appears to have been two separate sample runs pasted together, each with different parameters. In experiments 3 and 4, Bem’s comparing something he knows will have an effect (forward priming) with something he hopes will have an effect (retroactive priming). There’s no explicit comparison of the known-effect’s size to that found in other studies, Bem’s write-up appears to settle for showing statistical significance. Merely showing there is an effect does not demonstrate that effect is of the same magnitude as expected.

Conversely, experiments 5 and 6 have a very large number of controls, relative to the experimental conditions. This is wasteful, certainly, but it could also throw off the analysis: since the confidence interval narrows as more samples are taken, we can tighten one side up by throwing more datapoints in and taking advantage of the p-value’s weakness.

Experiment 6 might show this in action. For the first fifty subjects, the control group was further from the null value than the negative image group, but not as extreme as the erotic image one. Three months later, the next fourty-one subjects are further from the null value than both the experimental groups, but this time in the opposite direction! Here, Bem drops the size of the experimental groups and increases the size of the control group; for the next nineteen subjects, the control group is again more extreme than the negative image group and again less extreme than the erotic group, plus the polarity has flipped again. For the last fourty subjects, Bem increased the sizes of all groups by 25%, but the control is again more extreme and the polarity has flipped yet once more. Nonetheless, adding all four runs together allows all that flopping to cancel out, and Bem to honestly write “On the neutral control trials, participants scored at chance level: 49.3%, t(149) = -0.66, p = .51, two-tailed.” This looks a lot like tweaking parameters on-the-fly to get a desired outcome.

It also shows there’s substantial noise in Bem’s instruments. What’s the odds that the negative image group success rate would show less variance than the control group, despite having anywhere from a third to a sixth of the sample size? How can their success rate show less variance than the erotic image group, despite having the same sample size? These scenarios aren’t impossible, but with them coming at a time when Bem was focused on precognition via negative images it’s all quite suspicious.

The Control Isn’t a Control

All too often, researchers using frequentist statistics get blinded by the way p-values ignore the null hypothesis, and don’t bother checking their control groups. Bem’s fairly good about this, but we can do better.

All of Bem’s experiments, save 3 and 4, rely on Bernoulli processes; every person has some probability of guessing the next binary choice correctly, due possibly to inherent precognitive ability, and that probability does not change with time. It follows that the distribution of successful guesses follows the binomial distribution, which can be written:

P( s `divides` p,f ) ~=~ { (s+f)"!" } over { s"!" f"!" } p^s ( 1-p )^f where s is the number of successes, f the number of failures, and p the odds of success; that means P ( s | p,f ) translates to “the probability of having s successes, given the odds of success are p and there were f failures.” Naturally, p must be between 0 and 1.

Let’s try a thought experiment: say you want to test if a single six-sided die is biased to come up 1. You roll it thirty-six times, and observe four instances where it comes up 1. Your friend tosses it seventy-two times, and spots fifteen instances of 1. You’d really like to pool your results together and get a better idea of how fair the die is; how would you do this? If you answered “just add all the successes together, as well as the failures,” you nailed it!The probability distribution of rolling a 1 for a given die, according to you and your friend's experiments.The results look pretty good; both you and your friend would have suspected the die was biased based on your individual rolls, but the combined distribution looks like what you’d expect from a fair die.

But my Bayes 208 post was on conjugate distributions, which defang a lot of the mathematical complexity that comes from Bayesian methods by allowing you to merge statistical distributions. Sit back and think about what just happened: both you and your friend examined the same Bernoulli process, resulting in two experiments and two different binomial distributions. When we combined both experiments, we got back another binomial distribution. The only way this differs from Bayesian conjugate distributions is the labeling; had I declared your binomial to be the prior, and your friend’s to be the likelihood, it’d be obvious the combination was the posterior distribution for the odds of rolling a 1.

Well, almost the only difference. Most sources don’t list the binomial distribution as the conjugate for this situation, but instead the Beta distribution:

Beta( p `divides` %alpha,%beta ) ~=~ { %GAMMA(%alpha + %beta) } over { %GAMMA(%alpha) %GAMMA(%beta) } p^{%alpha-1} ( 1-p )^{%beta-1}

But I think you can work out the two are almost identical, without any help from me. The only real advantage of the Beta distribution is that it allows non-integer successes and failures, thanks to the Gamma function, which in turn permits a nice selection of priors.

In theory, then, it’s dirt easy to do a Bayesian analysis of Bem’s handiwork: tally up the successes and failures from each individual experiment, add them together, and plunk them into a binomial distribution. In practice, there are three hurdles. The easy one is the choice of prior; fortunately, Bem’s datasets are large enough that they swamp any reasonable prior, so I’ll just use the Bayes-Laplace one and be done with it. A bigger one is that we’ve got at least three distinct Bernoulli processes in play: pressing a button to classify an image (experiments 3, 4), remembering a word from a list (8, 9), and guessing the next image out of a binary pair (everything else). If you’re trying to describe precognition and think it varies depending on the input image, then the negative image trials have to be separated from the erotic image ones. Still, this amounts to little more than being careful with the datasets and thinking hard about how a universal precognition would be expressed via those separate processes.

The toughest of the bunch: Bem didn’t record the number of successes and failures, save experiments 8 and 9. Instead, he either saved log timings (experiments 3 and 4) or the success rate, as a percentage of all trials. This is common within frequentist statistics, which is obsessed with maximal likelihoods, but it destroys information we could use to build a posterior distribution. Still, this omission isn’t fatal. We know the number of successes and failures are integer values. If we correctly guess their sum and multiply it by the rate, the result will be an integer; if we pick an incorrect sum, it’ll be a fraction. A complication arrives if there are common factors between the number of successes and the total trials, but there should some results which lack those factors. By comparing results to one another, we should be able to work out both what the underlying total was, as well as when that total changes, and in the process we learn the number of successes and can work backwards to the number of failures.

As the heading suggests, there’s something interesting hidden in the control groups. I’ll start with the binary image pair controls, which behave a lot like a coin flip; as the samples pile up, we’d expect the control distribution to migrate to the 50% line. When we do all the gathering, we find…

What happens when we combine the control groups for the binary image process from Bem (2011).… that’s not good. Experiment 1 had a great control group, but the controls from experiment 5 and 6 are oddly skewed. Since they had a lot more samples, they wind up dominating the posterior distribution and we find ourselves with fully 92.5% of the distribution below the expected value of p = 0.5. This sets up a bad precedent, because we now know that Bem’s methodology can create a skew of 0.67% away from 50%; for comparison, the combined signal from all studies was a skew of 0.83%. Are there bigger skews in the methodology of experiments 2, 3, 4, or 7? We’ve got no idea, because Bem never ran control groups.

Experiments 3 and 4 lack any sort of control, so we’re left to consider the strongest pair of experiments in Bem’s paper, 8 and 9. Bem used a Differential Recall score instead of the raw guess count, as it makes the null effect have an expected value of zero. This Bayesian analysis can cope with a non-zero null, so I’ll just use a conventional success/failure count.

Experiments 8 and 9 from Bem's 2011 paper.

On the surface, everything’s on the up-and-up. The controls have more datapoints between them than the treatment group, but there’s good and consistent separation between them and the treatment. Look very careful at the numbers on the bottom, though; the effects are in quite different places. That’s strange, given the second study only differs from the first via some extra practice (page 14); I can see that improving up the main control and treatment groups, but why does it also drag along the no-practice groups? Either there aren’t enough samples here to get rid of random noise, which seems unlikely, or the methodology changed enough to spoil the replication.

Come to think of it, one of those controls isn’t exactly a control. I’ll let Bem explain the difference.

Participants were first shown a set of words and given a free recall test of those words. They were then given a set of practice exercises on a randomly selected subset of those words. The psi hypothesis was that the practice exercises would retroactively facilitate the recall of those words, and, hence, participants would recall more of the to-be-practiced words than the unpracticed words. […]

Although no control group was needed to test the psi hypothesis in this experiment, we ran 25 control sessions in which the computer again randomly selected a 24-word practice set but did not actually administer the practice exercises. These control sessions were interspersed among the experimental sessions, and the experimenter was uninformed as to condition. [page 13]

So the “no-practice treatment,” as I dubbed it in the charts, is actually a test of precognition! It happens to be a lousy one, as without a round of post-hoc practice to prepare subjects their performance should be poor. Nonetheless, we’d expect it to be as good or better than the matching controls. So why, instead, was it consistently worse? And not just a little worse, either; for experiment 9, it was as worse from its control as the main control was from its treatment group.

What it all Means

I know, I seems to be a touch obsessed with one social science paper. The reason has less to do with the paper than the context around it: you can make a good argument that the current reproducibility crisis is thanks to Bem. Take the words of E.J. Wagenmakers et al.

Instead of revising our beliefs regarding psi, Bem’s research should instead cause us to revise our beliefs on methodology: The field of psychology currently uses methodological and statistical strategies that are too weak, too malleable, and offer far too many opportunities for researchers to befuddle themselves and their peers. […]

We realize that the above flaws are not unique to the experiments reported by Bem (2011). Indeed, many studies in experimental psychology suffer from the same mistakes. However, this state of affairs does not exonerate the Bem experiments. Instead, these experiments highlight the relative ease with which an inventive researcher can produce significant results even when the null hypothesis is true. This evidently poses a significant problem for the field and impedes progress on phenomena that are replicable and important.

Wagenmakers, Eric–Jan, et al. “Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011).” (2011): 426.

When it was pointed out Bayesian methods wiped away his results, Bem started doing Bayesian analysis. When others pointed out a meta-analysis could do the same, Bem did that too. You want open data? Bem was a hipster on that front, sharing his data around to interested researchers and now the public. He’s been pushing for replication, too, and in recent years has begun pre-registering studies to stem the garden of forking paths. Bem appears to be following the rules of science, to the letter.

I also know from bitter experience that any sufficiently large research project will run into data quality issues. But, now that I’ve looked at Bem’s raw data, I’m feeling hoodwinked. I expected a few isolated issues, but nothing on this scale. If Bem’s 2011 paper really is a type specimen for what’s wrong with the scientific method, as practiced, then it implies that most scientists are garbage at designing experiments and collecting data.

I’m not sure I can accept that.

One Hundred Prisoners

Here’s a question to puzzle out:

An especially cruel jailer announces a “game” to their 100 prisoners. A cabinet with 100 drawers sits in a heavily-monitored room. In each drawer lies one prisoner’s number. If every prisoner draws their own number from a drawer, every one of them walks free; if even one of them fails, however, all the prisoners must spend the rest of their days in solitary confinement. Prisoners must reset the drawers and room after their attempt, otherwise all of them head to solitary, and to ensure they cannot give each other hints everyone goes directly to solitary after their attempt. The jailer does offer a little mercy, though: prisoners can check up to half the drawers in the cabinet during their attempt, and collectively they have plenty of time to brainstorm a strategy.

What is the best one they could adopt?

This seems like a hopeless situation, no doubt. The odds of any one prisoner randomly finding their number is 50%, and the odds of that happening 100 times are so low they make death by shark look like a sure thing.

Nonetheless, the prisoners settle on a strategy. With a little programming code, we can evaluate the chances it’ll grant all their freedom.

      Algorithm	    Trials	      Successes	Percentage
   Random Guess	     50000	              0	0.0000000
         Cyclic	     50000	          15687	31.3740000

Whhaaa? How can the prisoners pull off odds like that? [Read more…]

The Tuskegee Syphilis Study

Was it three years ago? Almost to the day, from the looks of it.

Biomedical research, then, promises vast increases in life, health, and flourishing. Just imagine how much happier you would be if a prematurely deceased loved one were alive, or a debilitated one were vigorous — and multiply that good by several billion, in perpetuity. Given this potential bonanza, the primary moral goal for today’s bioethics can be summarized in a single sentence.

Get out of the way.

A truly ethical bioethics should not bog down research in red tape, moratoria, or threats of prosecution based on nebulous but sweeping principles such as “dignity,” “sacredness,” or “social justice.” Nor should it thwart research that has likely benefits now or in the near future by sowing panic about speculative harms in the distant future.

That was Steven Pinker arguing that biomedical research is too ethical. Follow that link and you’ll see my counter-example: the Tuskegee syphilis study. It is a literal textbook example of what not to do in science. Pinker didn’t mention it back then, but it was inevitable he’d have to deal with it at some time. Thanks to PZ, I now know he has.

At a recent conference, another colleague summed up what she thought was a mixed legacy of science: vaccines for smallpox on the one hand; the Tuskegee syphilis study on the other. In that affair, another bloody shirt ind the standard narrative about the evils of science, public health researchers, beginning in 1932, tracked the progression of untreated latent syphilis in a sample of impoverished African Americans for four decades. The study was patently unethical by today’s standards, though it’s often misreported to pile up the indictment. The researchers, many of them African American or advocates of African American health and well-being, did not infect the participants as many people believe (a misconception that has led to the widespread conspiracy theory that AIDS was invented in US government labs to control the black population). And when the study began, it may even have been defensible by the standards of the day: treatments for syphilis (mainly arsenic) were toxic and ineffective; when antibiotics became available later, their safety and efficacy in treating syphilis were unknown; and latent syphilis was known to often resolve itself without treatment. But the point is that the entire equation is morally obtuse, showing the power of Second Culture talking points to scramble a sense of proportionality. My colleague’s comparison assumed that the Tuskegee study was an unavoidable part of scientific practice as opposed to a universally deplored breach, and it equated a one-time failure to prevent harm to a few dozen people with the prevention of hundreds of millions of deaths per century in perpetuity.

What horse shit.

To persuade the community to support the experiment, one of the original doctors admitted it “was necessary to carry on this study under the guise of a demonstration and provide treatment.” At first, the men were prescribed the syphilis remedies of the day — bismuth, neoarsphenamine, and mercury — but in such small amounts that only 3 percent showed any improvement. These token doses of medicine were good public relations and did not interfere with the true aims of the study. Eventually, all syphilis treatment was replaced with “pink medicine” — aspirin. To ensure that the men would show up for a painful and potentially dangerous spinal tap, the PHS doctors misled them with a letter full of promotional hype: “Last Chance for Special Free Treatment.” The fact that autopsies would eventually be required was also concealed. As a doctor explained, “If the colored population becomes aware that accepting free hospital care means a post-mortem, every darky will leave Macon County…”

  • “it equated a one-time failure to prevent harm to a few dozen people”: In reality, according to that last source, “28 of the men had died directly of syphilis, 100 were dead of related complications, 40 of their wives had been infected, and 19 of their children had been born with congenital syphilis.” As of August last year, 12 former children were still receiving financial compensation.
  • “the prevention of hundreds of millions of deaths per century in perpetuity”: In reality, the Tuskegee study wasn’t the only scientific study looking at syphilis. Nor even the first. Syphilis was discovered in 1494, named in 1530, the causative organism was found in 1905, and the first treatments were developed in 1910. The science was dubious at best:

The study was invalid from the very beginning, for many of the men had at one time or another received some (though probably inadequate) courses of arsenic, bismuth and mercury, the drugs of choice until the discovery of penicillin, and they could not be considered untreated. Much later, when penicillin and other powerful antibiotics became available, the study directors tried to prevent any physician in the area from treating the subjects – in direct opposition to the Henderson Act of 1943, which required treatment of venereal diseases.

A classic study of untreated syphilis had been completed years earlier in Oslo. Why try to repeat it? Because the physicians who initiated the Tuskegee study were determined to prove that syphilis was ”different” in blacks. In a series of internal reviews, the last done as recently as 1969, the directors spoke of a ”moral obligation” to continue the study. From the very beginning, no mention was made of a moral obligation to treat the sick.

Pinker’s response to the Tuskegee study is to re-write history to suit his narrative, again. No wonder he isn’t a fan of ethics.

EvoPsych and Scientific Racism

I’m not a fan of EvoPsych. It manages the feat of misunderstanding both evolution and psychology, its researchers are prone to wild misrepresentation of fields they clearly don’t understand, and it has all the trappings of a pseudo-science. Nonetheless, I’ve always thought they had enough sense to avoid promoting scientific racism, at least openly.

[CONTENT WARNING: Some of them don’t.]

[Read more…]

Computational Propaganda

Sick of all this memo talk? Too bad, because thanks to Lynna, OM in the Political Madness thread I discovered a new term: “computational propaganda,” or the use of computers to help spread talking points and generate “grassroots” activism. It’s a lot more advanced than running a few bots, too. You’ll have to read the article to learn the how and why, but I can entice you with its conclusion:

The problem with the term “fake news” is that it is completely wrong, denoting a passive intention. What is happening on social media is very real; it is not passive; and it is information warfare. There is very little argument among analytical academics about the overall impact of “political bots” that seek to influence how we think, evaluate and make decisions about the direction of our countries and who can best lead us—even if there is still difficulty in distinguishing whose disinformation is whose. Samantha Bradshaw, a researcher with Oxford University’s Computational Propaganda Research Project who has helped to document the impact of “polbot” activity, told me: “Often, it’s hard to tell where a particular story comes from. Alt-right groups and Russian disinformation campaigns are often indistinguishable since their goals often overlap. But what really matters is the tools that these groups use to achieve their goals: Computational propaganda serves to distort the political process and amplify fringe views in ways that no previous communication technology could.”

This machinery of information warfare remains within social media’s architecture. The challenge we still have in unraveling what happened in 2016 is how hard it is to pry the Russian components apart from those built by the far- and alt-right—they flex and fight together, and that alone should tell us something. As should the fact that there is a lesser far-left architecture that is coming into its own as part of this machine. And they all play into the same destructive narrative against the American mind.

Democracies have not faced a challenge like this since yellow journalism.

The Nunes Memo

To understand what the memo is talking about, you have to trace back three separate threads.

Carter Page. In 2013, a number of Russian government agents tried to recruit him, via a lucrative deal with Gazprom. In return, Page admits to feeding them documents (which he claims were “basic immaterial information and publicly available research documents”). As luck would have it, at least one of those agents was already under surveillance (which would later lead to a conviction for espionage), so the FBI asked for and got a FISA warrant to ensure Page wasn’t part of a Russian spy network. That would have been scary, as Page was an advisor to then-candidate Donald Trump. These warrants expire every 90 days, and if the government wants them renewed they have to plead their case to a judge in a special court. The warrant against Page was renewed at least four times, by multiple people and multiple judges.

George Papadopoulos. According to the New York Times, “During a night of heavy drinking at an upscale London bar in May 2016, George Papadopoulos, a young foreign policy adviser to the Trump campaign, made a startling revelation to Australia’s top diplomat in Britain: Russia had political dirt on Hillary Clinton.” By “dirt,” he meant “stolen emails;” a spear-phishing campaign against the DNCC by a Russian government hacking group dubbed “Fancy Bear” succeeded in mid-March 2016, and on April 26th a Russian intelligence agent was teasing Papadopoulos with the emails. When they were released to the public in July 2016, the FBI opened an investigation into Papadopoulos. According to his guilty plea to the Special Investigator, Papadopoulos was bullish on getting Trump to meet Vladimir Putin and kept up his contacts with Russian spies during his time on the Trump campaign. As late as December 4th of 2016, Papadopoulos was publicly calling himself a Trump advisor.

Fusion GPS. In multiple congressional testimonies, Glenn Simpson said that his company was hired by The Washington Free Beacon to dig up dirt on Trump. Fusion GPS’s research found a tonne of it, most notably potential money laundering for Russian oligarchs. Simpson hired Christopher Steele, a former British spy with decades of Russia experience and a solid track record, to investigate the Russian angle. Steele’s work would eventually become “the Trump Dossier” (which may need to be renamed, as I somehow missed a second dossier). Partway through generating the 17 separate memos which would become The Dossier, Trump was crowned the official Republican presidential candidate and the Washington Free Beacon stopped funding it; the Democratic National Committee then picked up the tab. What Steele found had him so terrified that he, with Simpson’s approval, started sharing the information around. The first contact was an FBI agent in Rome; when the FBI themselves seemed oddly disinterested, he went to journalist David Corn and later Senator John McCain. The Dossier’s veracity has improved over time, and the Kremlin may have iced one of Steele’s contacts.

You’ll need a bit more background to fully grok it, but I can sprinkle that in while covering the claims of the memo itself.

[Read more…]

Winning Hearts and Minds

I’ll forgive you if haven’t heard of Ian Danskin, if only because he’s primarily known on YouTube as Innuendo Studios. You know, the person behind “Why Are You So Angry?” and more recently “The Alt-Right Playbook.” The latter project is aimed at sharpening the rhetoric of progressives to better defend against the “playbook” the Alt-Right uses in online arguments. It’s still a work in progress, but recently Danskin tried to jump ahead and compress it all into a single lecture.

[Read more…]