Daryl Bem and the Replication Crisis

I’m disappointed I don’t see more recognition of this.

If one had to choose a single moment that set off the “replication crisis” in psychology—an event that nudged the discipline into its present and anarchic state, where even textbook findings have been cast in doubt—this might be it: the publication, in early 2011, of Daryl Bem’s experiments on second sight.

I’ve actually done a long blog post series on the topic, but in brief: Daryl Bem was convinced that precognition existed. To put these beliefs to the test, he had subjects try to predict an image that was randomly generated by a computer. Over eight experiments, he found that they could indeed do better than chance. You might think that Bem is a kook, and you’d be right.

But Bem is also a scientist.

Now he would return to JPSP [the Journal of Personality and Social Psychology] with the most amazing research he’d ever done—that anyone had ever done, perhaps. It would be the capstone to what had already been a historic 50-year career.

Having served for a time as an associate editor of JPSP, Bem knew his methods would be up to snuff. With about 100 subjects in each experiment, his sample sizes were large. He’d used only the most conventional statistical analyses. He’d double- and triple-checked to make sure there were no glitches in the randomization of his stimuli. Even with all that extra care, Bem would not have dared to send in such a controversial finding had he not been able to replicate the results in his lab, and replicate them again, and then replicate them five more times. His finished paper lists nine separate ministudies of ESP. Eight of those returned the same effect.

One way to attack an argument is to merely follow its logic. If you can find it leads to an absurd conclusion, the argument must have been flawed even if you cannot find the flaw. Bem had inadvertently discovered a “reductio ad absurdum” argument against contemporary scientific practice: if proper scientific procedure can prove ESP exists, proper scientific procedure must be broken.

Meanwhile, at the conference in Berlin, [E.J.] Wagenmakers finally managed to get through Bem’s paper. “I was shocked,” he says. “The paper made it clear that just by doing things the regular way, you could find just about anything.”

On the train back to Amsterdam, Wagenmakers drafted a rebuttal, to be published in JPSP alongside the original research. The problems he saw in Bem’s paper were not particular to paranormal research. “Something is deeply wrong with the way experimental psychologists design their studies and report their statistical results,” Wagenmakers wrote. “We hope the Bem article will become a signpost for change, a writing on the wall: Psychologists must change the way they analyze their data.”

Slate has a long read up on the current replication crisis, and how it links to Bem. It’s aimed at a lay audience and highly readable; I recommend giving it a click.

So You Wanna Falsify Gender Studies?

How would a skeptic determine whether or not an area of study was legit? The obvious route would be to study up on the core premises of that field, recording citations as you go; map out how they are connected to one another and supported by the evidence, looking for weak spots; then write a series of articles sharing those findings.

What they wouldn’t do is generate a fake paper purporting to be from that field of study but deliberately mangling the terminology, submit it to a low-ranked and obscure journal for peer review, have it rejected from that journal, based on feedback then submit it to an second journal that was semi-shady and even more obscure, have it published, then parade that around as if it meant something.

Alas, it seems the Skeptic movement has no idea how basic skepticism works. Self-proclaimed “skeptics” Peter Boghossian and James Lindsay took the second route, and were cheered on by Michael Shermer, Richard Dawkins, Jerry Coyne, Steven Pinker, and other people calling themselves skeptics. A million other people have pointed and laughed at them, so I won’t bother joining in.

But no-one seems to have brought up the first route. Let’s do a sketch of actual skepticism, then, and see how well gender studies holds up.

What’s Claimed?

Right off the bat, we hit a problem: most researchers or advocates in gender studies do not have a consensus sex or gender model.

The Genderbread Person, version 3.3. From http://itspronouncedmetrosexual.com/2015/03/the-genderbread-person-v3/

This is one of the more popular explainers for gender floating out on the web. Rather than focus on the details, however, I’d like you to note this graphic is labeled “version 3.3”. In other words, Sam Killermann has tweaked and revised it three times over. It also conflicts with the Gender Unicorn, which has a categorical approach to “biological sex” and adds “other genders,” and it no longer embraces the idea of a spectrum thus contradicting a lot of other models. Confront Killermann on this, and I bet they’d shrug their shoulders and start crafting another model.

The model isn’t all that important. Instead, gender studies has reached a consensus on an axiom and a corollary: the two-sex, two-gender model is an oversimplification, and that sex/gender are complicated. Hence why models of sex or gender continually fail, the complexity almost guarantees exceptions to your rules.

There’s a strong parallel here to agnostic atheism’s “lack of belief” posture, as this flips the burden of proof. Critiquing the consensus of gender studies means asserting a positive statement, that the binarist model is correct, while the defense merely needs to swat down those arguments without advancing any of its own.

Nothing Fails Like Binarism

A single counter-example is sufficient to refute a universal rule. To take a classic example, I can show “all swans are white” is a false statement by finding a single black swan. If someone came along and said “well yeah, but most swans are white, so we can still say that all swans are white,” you’d think of them as delusional or in denial.

Well, I can point to four people who do not fit into the two-sex two-gender model. Ergo, that model cannot be true in all cases, and the critique of gender studies fails after a thirty second Google search.

When most people are confronted with this, they invoke a three-sex model (male, female, and “other/defective”) but call it two-sex in order to preserve their delusion. That so few people notice the contradiction is a testament to how hard the binary model is hammered into us.

But Where’s the SCIENCE?!

Another popular dodge is to argue that merely saying you don’t fit into the binary isn’t enough; if it wasn’t in peer-reviewed research, it can’t be true. This is no less silly. Do I need to publish a paper about the continent of Africa to say it exists? Or my computer? If you doubt me, browse Retraction Watch for a spell.

Once you’ve come back, go look at the peer-reviewed research which suggests gender is more complicated than a simple binary.

At times, the prevailing answers were almost as simple as Gray’s suggestion that the sexes come from different planets. At other times, and increasingly so today, the answers concerning the why of men’s and women’s experiences and actions have involved complex multifaceted frameworks.

Ashmore, Richard D., and Andrea D. Sewell. “Sex/Gender and the Individual.” In Advanced Personality, edited by David F. Barone, Michel Hersen, and Vincent B. Van Hasselt, 377–408. The Plenum Series in Social/Clinical Psychology. Springer US, 1998. doi:10.1007/978-1-4419-8580-4_16.

Correlational findings with the three scales (self-ratings) suggest that sex-specific behaviors tend to be mutually exclusive while male- and female-valued behaviors form a dualism and are actually positively rather than negatively correlated. Additional analyses showed that individuals with nontraditional sex role attitudes or personality trait organization (especially cross-sex typing) were somewhat less conventionally sex typed in their behaviors and interests than were those with traditional attitudes or sex-typed personality traits. However, these relationships tended to be small, suggesting a general independence of sex role traits, attitudes, and behaviors.

Orlofsky, Jacob L. “Relationship between Sex Role Attitudes and Personality Traits and the Sex Role Behavior Scale-1: A New Measure of Masculine and Feminine Role Behaviors and Interests.” Journal of Personality 40, no. 5 (May 1981): 927–40.

Women’s scores on the BSRI-M and PAQ-M (masculine) scales have increased steadily over time (r’s = .74 and .43, respectively). Women’s BSRI-F and PAQ-F (feminine) scale  scores do not correlate with year. Men’s BSRI-M scores show a weaker positive relationship with year of administration (r = .47). The effect size for sex differences on the BSRI-M has also changed over time, showing a significant decrease over the twenty-year period. The results suggest that cultural change and environment may affect individual personalities; these changes in BSRI and PAQ means demonstrate women’s increased endorsement of masculine-stereotyped traits and men’s continued nonendorsement of feminine-stereotyped traits.

Twenge, Jean M. “Changes in Masculine and Feminine Traits over Time: A Meta-Analysis.” Sex Roles 36, no. 5–6 (March 1, 1997): 305–25. doi:10.1007/BF02766650.

Male (n = 95) and female (n = 221) college students were given 2 measures of gender-related personality traits, the Bem Sex-Role Inventory (BSRI) and the Personal Attributes Questionnaire, and 3 measures of sex role attitudes. Correlations between the personality and the attitude measures were traced to responses to the pair of negatively correlated BSRI items, masculine and feminine, thus confirming a multifactorial approach to gender, as opposed to a unifactorial gender schema theory.

Spence, Janet T. “Gender-Related Traits and Gender Ideology: Evidence for a Multifactorial Theory.” Journal of Personality and Social Psychology 64, no. 4 (1993): 624.

Oh sorry, you didn’t know that gender studies has been a science for over four decades? You thought it was just an invention of Tumblr, rather than a mad scramble by scientists to catch up with philosophers? Tsk, that’s what you get for pretending to be a skeptic instead of doing your homework.

I Hate Reading

One final objection is that field-specific jargon is hard to understand. Boghossian and Lindsay seem to think it follows that the jargon is therefore meaningless bafflegab. I’d hate to see what they’d think of a modern physics paper; jargon offers precise definitions and less typing to communicate your ideas, and while it can quickly become opaque to lay people jargon is a necessity for serious science.

But let’s roll with the punch, and look outside of journals for evidence that’s aimed at a lay reader.

In Sexing the Body, Gender Politics and the Construction of Sexuality Fausto-Sterling attempts to answer two questions: How is knowledge about the body gendered? And, how gender and sexuality become somatic facts? In other words, she passionately and with impressive intellectual clarity demonstrates how in regards to human sexuality the social becomes material. She takes a broad, interdisciplinary perspective in examining this process of gender embodiment. Her goal is to demonstrate not only how the categories (men/women) humans use to describe other humans become embodied in those to whom they refer, but also how these categories are not reflect ed in reality. She argues that labeling someone a man or a woman is solely a social decision. «We may use scientific knowledge to help us make the decision, but only our beliefs about gender – not science – can define our sex» (p. 3) and consistently throughout the book she shows how gender beliefs affect what kinds of knowledge are produced about sex, sexual behaviors, and ultimately gender.

Gober, Greta. “Sexing the Body Gender Politics and the Construction of Sexuality.” Humana.Mente Journal of Philosophical Studies, 2012, Vol. 22, 175–187

Making Sex is an ambitious investigation of Western scientific conceptions of sexual difference. A historian by profession, Laqueur locates the major conceptual divide in the late eighteenth century when, as he puts it, “a biology of cosmic hierarchy gave way to a biology of incommensurability, anchored in the body, in which the relationship of men to women, like that of apples to oranges, was not given as one of equality or inequality but rather of difference” (207). He claims that the ancients and their immediate heirs—unlike us—saw sexual difference as a set of relatively unimportant differences of degree within “the one-sex body.” According to this model, female sexual organs were perfectly homologous to male ones, only inside out; and bodily fluids—semen, blood, milk—were mostly “fungible” and composed of the same basic matter. The model didn’t imply equality; woman was a lesser man, just not a thing wholly different in kind.

Altman, Meryl, and Keith Nightenhelser. “Making Sex (Review).” Postmodern Culture 2, no. 3 (January 5, 1992). doi:10.1353/pmc.1992.0027.

In Delusions of Gender the psychologist Cordelia Fine exposes the bad science, the ridiculous arguments and the persistent biases that blind us to the ways we ourselves enforce the gender stereotypes we think we are trying to overcome. […]

Most studies about people’s ways of thinking and behaving find no differences between men and women, but these fail to spark the interest of publishers and languish in the file drawer. The oversimplified models of gender and genes that then prevail allow gender culture to be passed down from generation to generation, as though it were all in the genes. Gender, however, is in the mind, fixed in place by the way we store information.

Mental schema organise complex experiences into types of things so that we can process data efficiently, allowing us, for example, to recognise something as a chair without having to notice every detail. This efficiency comes at a cost, because when we automatically categorise experience we fail to question our assumptions. Fine draws together research that shows people who pride themselves on their lack of bias persist in making stereotypical associations just below the threshold of consciousness.

Everyone works together to re-inforce social and cultural environments that soft-wire the circuits of the brain as male or female, so that we have no idea what men and women might become if we were truly free from bias.

Apter, Terri. “Delusions of Gender: The Real Science Behind Sex Differences by Cordelia Fine.” The Guardian, October 11, 2010, sec. Books.

Have At ‘r, “Skeptics”

You want to refute the field of gender studies? I’ve just sketched out the challenges you face on a philosophical level, and pointed you to the studies and books you need to refute. Have fun! If you need me I’ll be over here, laughing.

[HJH 2017-05-21: Added more links, minor grammar tweaks.]

[HJH 2017-05-22: Missed Steven Pinker’s Tweet. Also, this Skeptic fail may have gone mainstream:

Boghossian and Lindsay likely did damage to the cultural movements that they have helped to build, namely “new atheism” and the skeptic community. As far as I can tell, neither of them knows much about gender studies, despite their confident and even haughty claims about the deep theoretical flaws of that discipline. As a skeptic myself, I am cautious about the constellation of cognitive biases to which our evolved brains are perpetually susceptible, including motivated reasoning, confirmation bias, disconfirmation bias, overconfidence and belief perseverance. That is partly why, as a general rule, if one wants to criticize a topic X, one should at the very least know enough about X to convince true experts in the relevant field that one is competent about X. This gets at what Brian Caplan calls the “ideological Turing test.” If you can’t pass this test, there’s a good chance you don’t know enough about the topic to offer a serious, one might even say cogent, critique.

Boghossian and Lindsay pretty clearly don’t pass that test. Their main claim to relevant knowledge in gender studies seems to be citations from Wikipedia and mockingly retweeting abstracts that they, as non-experts, find funny — which is rather like Sarah Palin’s mocking of scientists for studying fruit flies or claiming that Obamacare would entail “death panels.” This kind of unscholarly engagement has rather predictably led to a sizable backlash from serious scholars on social media who have noted that the skeptic community can sometimes be anything but skeptical about its own ignorance and ideological commitments.

When the scientists you claim to worship are saying your behavior is unscientific, maaaaybe you should take a hard look at yourself.]

Proof from Popularity (2)

Over-active Pattern Matching

One key element is our phenomenal ability to find patterns. Scientists have recently started to capitalize on this.

The “Rosetta” project followed the same pattern as many other distributed computing projects, at least at the start. David Baker and his colleagues were dealing with the difficult problem of protein folding. These little molecules are the workhorses of all living things, and do everything from speed up chemical reactions to transmit signals in your brain… yet they’ve been remarkably difficult to study. The problem is not with our understanding of molecular forces, or the blueprints to make any protein, but the sheer number of computations required to combine both into a folded protein. On a single computer, crunching through the numbers can take years. [149]

Baker decided to solve that problem by distributing the work; his group created a software program that could do protein folding, then gave it away to anyone interested. As of this writing, about 30,000 people are running Rosetta on at least one computer, [150] and speeding up the process by at least that many times. Every single one of them is doing this voluntarily, with their only compensation being a pretty picture of the folding process in action.

Soon, however, Baker’s group was fielding emails about those pretty pictures. The people running Rosetta noticed the software would get “stuck” in places, or waste time on solutions that even a non-expert could tell would go nowhere. They wondered if there was any way to “nudge” the software with hints.

Baker decided to take the hint himself, and hired Zoran Popović and David Salesin to turn his science program into a game called “foldit.” Users could now do much more than help their computer along; they could team up with others to solve “puzzles,” compete with others to earn a high score, or even write their own scripts to help remove some of the grunt work.

Adding humans into the mix paid off. One user named “Vertex” came up with the “Blue Fuse” helper script, and with refinements developed by others it rapidly became the most popular such helper on “foldit.” [151] When Baker had a look at the code, however, he was astonished to find it was a near-duplicate of an algorithm his lab was privately testing. Some comparisons between the two revealed that in seven months, a community of novices had managed to best years of research by experts in protein folding. [152] Other citizen science projects have found the same pattern; pooling together the opinions of non-experts gives results equal to what an expert could churn out, in a fraction of the time.

This ability to suss out patterns comes at a cost, however. As mentioned in the Witness proof, we’re also prone to finding patterns that don’t exist. It takes a minor stretch of someone else’s imagination to start assigning a language to random sounds, or guess there’s a mind behind mindless processes. From that, gods can be formed.

Classism

Back in the Morality proof, I used a little game theory to describe how morality could be evolved as an instinct. In the process, I also showed how classism[153] could also be bred into our bones.

Henri Tajfel made a career out of studying this, in fact. One fascinating study asked people to judge the length of lines. The participants were split into two groups, one of which was given the lines without any labels. The other group had their lines labelled by length, either “A” if the line was shorter than average or “B” if the line was longer. The second group automatically lumped lines by label, consistently guessing longer lengths for short lines and shorter lengths for longer ones to match up with their expectations for the two categories. The first group didn’t show any bias. [154]

What applies to lines also applies to people. Here’s an example from Tajfel himself:

The boys, who knew each other well, were divided into groups defined by flimsy and unimportant criteria [in this case, they were told it was how well they could count groups of dots; in reality, the researchers randomly assigned groups]. Their own individual interests were not affected by their choices, since they always assigned points to two other people and no one could know what any other boy’s choices were. The amount of money were not trivial for them: each boy left the experiment with the equivalent of about a dollar. Inasmuch as they could not know who was in their group and who was in the other group, they could have adopted either of two reasonable strategies. They could have chosen the maximum-joint-profit point of the matrices, which would mean that the boys as a total group would get the most money out of the experimenters, or they could choose the point of maximum fairness. Indeed, they did tend to choose the second alternative when their choices did not involve a distinction between ingroup and outgroup. As soon as this differentiation was involved, however, they discriminated in favour of the ingroup. The only thing we needed to do to achieve this result was to associate their judgements of numbers of dots with the use of the terms “your group,” and “the other group” in the instructions […]

        Tajfel, Henri. “Experiments in intergroup discrimination.” Scientific American 223.5 (1970): 96-102.

It’s a sobering thought. If we can start favouring an ingroup and punishing an outgroup, along lines as arbitrary as how good we are at counting dots, what are the odds of us carving lines in the sand over skin colour, or genitalia?

Or for that matter, beliefs and rituals?

There’s two key differences, however. You can’t change genitalia or skin colour, and gradations and variety are guaranteed. [155] Behaviour, however, is easily changed, allowing anyone to hop from outgroup to ingroup. It’s quite possible then for a behaviour-based ingroup to grow in size and dominate over all their outgroups. At a magical tipping point, the special treatment enjoyed within the ingroup outweighs any harm that could be inflicted by an outgroup, and the reverse is true from any outgroup’s point of view. There’s a strong incentive to switch, which only grows as more people give in. The eventual result is the disappearance of all outgroups, and the people that remain are nicer and more trusting to one another than they would have been in a non-classist situation.

In theory, of course. There are a number of practical barriers to this utopia.

For one, the ease that we divide ourselves means that in-groups are prone to splintering along the most trivial lines. This becomes a problem when the in-group cannot provide the benefits it once used to, or there is no real out-group, as there’s very little enforcing group cohesion. If only there were some way to invent an out-group, either by spreading tales of extreme debauchery and mayhem about a group that doesn’t live nearby, or simply making up an all-powerful one out of, say, folklore or leftover gods. If only.

I alluded to another in the Morality proof. Classism is easy prey for cheaters, who will happily wave around the in-group symbols but refuse to act as nice; the obvious counter is to add costly symbols and rituals, such as piercings and other body modifications. A less obvious one is to invoke an always-present, always watching police-thing to ensure everyone toes the line.

Evolution

While the above two components are enough to get a religion off the ground, they don’t explain why religions have such staying power. For that, you need one more ingredient: evolution.

As I discussed in my chapter on the Design/Teleological proof, evolution applies to far more than biology. It’s a general-purpose feedback loop that works equally well with culture and ideas. Religions are no exception, as they have all the basic requirements.

Traditions and rituals are easily passed from person to person, forming new copies of themselves. African-Americans brought over to the United States for the slave trade readily absorbed the religion of their captors, to the point that they are more likely to be Christian than those of European descent. [156]

The Christianity they adopted differs in important ways, however. African-Americans placed far more emphasis on music; while their European counterparts specialized in boring chants of ancient lyrics, they formed lively quoirs of freshly-minted words and created an entire musical genre known as “gospel.” [157] While their fellow European citizens became useed to dealing with a distant, aloof church system, African-Americans made theirs keenly interested in social justice and humanitarian causes. [158] By changing Christianity to suit them, they made it more suitable and thus tougher to walk away from.

This hasn’t escaped the notice of non-Africans. Faced with emptying pews, church leaders elsewhere have started adopting the innovations of African-Americans to woo churchgoers back. The use of popular music is on the uptick, [159] as well as an emphasis on charity and improving the lot of your fellow human. [160]

Self-replication, variation, limited environment, and feedback. Every aspect is there, creating a feedback loop of self-preservation replication.

Social Attachment

While evolution is the key ingredient, that doesn’t rule out some spices to help seal the deal.

We are social creatures, at heart. Like dogs, bats, prairie dogs, and our fellow apes, we rely on teamwork to survive. Not surprisingly, the process of evolution has strengthened that by planting various rewards within us.

[TODO: friendship bonus]

[TODO: parent bonus]

These same rewards could be redirected to other ends, however. Making friends with an imaginary being would convey some of the same rewards granted by hanging around with a real-life friend, only this imaginary being will never talk back to you. Having an imaginary being as a parent would provide a sense of security that a real-life parent could never provide.

Fear of Death

[TODO. But see “Terror Management Theory“]

How Religion Started

All merged to form religion. Hunter-gatherer society had good punishment for misbehavior, in the form of ostracism, but as population grew it became less useful. No police around to enforce rules, so who could? Religion evolved a solution with divine punishment via afterlife and a central authority, which made social organization of large groups much easier. Can see this in tribal spirituality vs. Early religions.

Thus: religion is social structure that benefits members by policing group behavior via a supernatural justice system. Evidence:

  • As countries get wealthier and more secure, religosity drops off dramatically
  • More likely in less secure nations, such as US (high health bankrupcies, high prison population, low feelings of security)
  • Religion is strongly correlated with large groups; smaller tribes just don’t need it.
  • Belief in god isn’t important, playing along with group is. EG:
  • limited grasp of important religious precepts, ignorance of holy texts
  • ease of ignoring basic codes when impractical (churchgoing stats in US, “believing in belief”)
  • afterlife more common than god
  • emphasis on community and communal ritual, instead of private prayer
  • the highly religious are treated with disdain, like the non-believers
  • wait, what role do true believers play? They make it easier to accept tribal markers, but also raise the bar for the rest; thus a love-hate relationship (“I wish I felt what they did”).

[149]  http://folding.stanford.edu/English/Science

[150]  http://boincstats.com/en/stats/14/project/detail

[151]  http://www.washington.edu/news/2011/11/07/paper-uncovers-power-of-foldit-gamers-strategies/

[152]  http://cosmiclog.nbcnews.com/_news/2011/11/07/8684955-gamers-create-scientific-recipes?lite

[153]  Early drafts used the word “tribalism,” but I found a lot of people took me to task for promoting discrimination against “less advanced” people. I struggled to think of a better name, but even “classism” carried an implication of discrimination. Then it hit me: there is no name for this which is free of discrimination, because by its very nature it is group-based discrimination, no more or less. It isn’t fair to say we were born and bred to discriminate, but it is fair to say all of us have that capacity built-in at the lowest possible level.
[HJH of the FUTURE: I’ve since seen “groupiness” tossed around in the scientific literature.]

[154]  “Human groups and social categories: studies in social psychology,” pg. 91-104, Henri Tajfel, 1981.

[155]  24 different genes have been associated with skin colour, and genetalia come in even more varieties; see here for illustrations: http://intersexroadshow.blogspot.ca/2011/04/intersex-genitalia-illustrated-and.html

156  Pew Forum, “U.S. Religious Landscape Survey, 2007.”

[157]  http://www.bostoncommunitychoir.org/history_of_gospel_music.htm

[158]  Lincoln, C. Eric, and Lawrence H. Mamiya. The Black church in the African American experience. Duke University Press Books, 1990.

[159]  TODO

[160]  TODO

The Most Hacked President

The current US President is a beginner’s class in hacking. Let’s rewind back to the end of January.

Lost amid the swirling insanity of the Trump administration’s first week, are the reports of the President’s continued insistence on using his Android phone (a Galaxy S3 or perhaps S4). This is, to put it bluntly, asking for a disaster. President Trump’s continued use of a dangerously insecure, out-of-date Android device should cause real panic. And in a normal White House, it would.

A Galaxy S3 does not meet the security requirements of the average teenager, let alone the purported leader of the free world. The best available Android OS on this phone (4.4) is a woefully out-of-date and unsupported. The S4, running 5.0.1, is only marginally better. Without exaggerating, hacking a Galaxy S3 or S4 is the type of project I would assign as homework for my advanced undergraduate classes.

I know, that one’s a bit old, but it nicely bookends more recent reporting.

We also visited two of President Donald Trump’s other family-run retreats, the Trump International Hotel in Washington, D.C., and a golf club in Sterling, Va. Our inspections found weak and open Wi-Fi networks, wireless printers without passwords, servers with outdated and vulnerable software, and unencrypted login pages to back-end databases containing sensitive information.

The risks posed by the lax security, experts say, go well beyond simple digital snooping. Sophisticated attackers could take advantage of vulnerabilities in the Wi-Fi networks to take over devices like computers or smart phones and use them to record conversations involving anyone on the premises.

“Those networks all have to be crawling with foreign intruders, not just [Gizmodo and] ProPublica,” said Dave Aitel, chief executive officer of Immunity, Inc., a digital security company, when we told him what we found.

Worried that your Pringles can will rat you out? Not to worry, planting a pineapple is easy-peasy.

At the White House, visitors must undergo a rigorous background screening before they’re let in the door. Agents scan every visitor’s full name, birth date, Social Security number, city of residence and country of birth.

But at Mar-a-Lago, gaining entry doesn’t require that degree of disclosure. Guests entering the club go through multiple security checkpoints staffed by the Secret Service looking for weapons or other immediate threats. But there’s only one requirement to produce a photo ID, and the club itself does not ask guests to provide their names or other information when they enter through the main wrought-iron gated door.

The club also serves as a venue for ticketed public events. Hosts for the slate of political and charity dinners booked at the president’s part-time home from now to the end of the club’s season in May told POLITICO the only request for information about attendees has come from the club itself. And all they’re asked to provide is a name, not additional information that can be used for Secret Service background checks in the event the president is in residence.

I’m a bit shocked no-one has tried blackmailing Trump yet. Maybe there are too many people jockeying for that honor? The WiFi networks probably look like a Spy vs. Spy comic by now.

Abolish Gender, unless it’s convenient for us

I was mulling over a post on Meghan Murphy, someone I’d heard about via Bill C-16, when I noticed Shiv beat me to it and did a much better job than I could. She even makes the same point I would have reached for:

… socialization cannot be both something that is possible to reject–as these feminists do with feminine gender roles–and also inevitable destiny. These are obviously mutually exclusive states. That women buck against the subordination expected of them by patriarchs is plain evidence that these socialized experiences are not fixed points of references but experiences that can be continuously and willfully re-contextualized. And if that’s the case, so-called “male socialization”–the standard idea of which does not map neatly to trans women’s experiences–is not as useful if one’s intention is to drive a wedge between cis and trans womanhood. That this observation is seldom accounted for in the TERF mythology speaks to its importance in these kinds of narratives.

This bugged me when I first learned of TERFs, I found it bizarre that they simultaneously argued gender is fluid like water, yet sticks to you like superglue.

… if anatomy is so strongly associated with a tendency to violence, how can you hope to improve things by destroying the concept of “gender?” …  I have yet to see a single TERF with a self-coherent view of sex/gender. That’s because their “criticism” isn’t actually a critique, based on solid evidence and analysis, but a fig leaf to disguise their bigotry.

I prefer Shiv’s phrasing, though, and her post covers a lot more than one note. Give it a boo.

Proof from Popularity (1)

Proof from Popularity

Some things never go out of style.

The Sun always rises in the East and sets in the West. The seasons come and go in an orderly manner. Tides rise and fall; there’s never a miscommunication.

We always seem to have a god around. The vast majority of human beings, living or dead, believe or believed in one or more gods. The details differ, of course, but not the desire.

Nothing else in our cultures has been as permanent. Traditions get created, changed, lost, and revived all the time. In the United States, an ancient fertility festival has become an excuse to eat chocolate. In Japan the tradition of Seppuku, or ritual suicide by slicing open one’s stomach, has died out. Norway has largely given up blót, which consisted of hanging various animals (including humans) and creating a feast from their flesh. Fondue was revived by Swiss wine and cheese producers, to encourage people to buy more wine and cheese. Something similar happened in the United States in the 1930’s; diamond producers had an excess of diamonds, so they hired marketers to create more demand by linking marriage proposals to the gift of a diamond ring.

Doesn’t the continuous popularity of religion speak to the existence of a higher power?

Bridge Jumping

Many of us were taught at a young age that just because something is popular doesn’t mean it’s right. Humans, like other social creatures, tend to form packs or tribes with a hierarchy of power. We reinforce these groupings through shared behaviour, by grooming one another or parcelling out food.

So if a high-ranking member does something notable, like harass someone not in the clan, there’s an incredible amount of pressure to imitate them. Our culture has decided that this instinct should be resisted,[146] so we try to teach children to think of the greater good instead. Wrong is wrong, no matter how popular it is.

This idea persists into adulthood. Think of the people you consider moral heroes. I’m willing to bet that while their neighbours cried yes, they said no. Oscar Schindler is praised for saving a thousand Jews while his peers were hunting them down. From the opposite end, the Neurumberg trials sent out a clear message that “I’m just following orders” is not an excuse; if a superior commands you to do something amoral, or everyone else in your unit is committing vile acts, you must refuse to go with the crowd. Otherwise, you are as guilty as them.

Therefore, we try not to judge the truth of things based on popularity. Adding a special exemption for religion is a poor idea. The non-religious[147] are currently the third most popular “religion,” after Islam and Christianity, and have never held a larger proportion of the world’s population. Does this mean a god is less likely to exist as time goes on? If Europe were hit by a giant meteor, wiping out a large chunk of the non-religious, does this mean religion is now more truthful?

Judgement Day

So if we can’t judge religion to be useful by how popular it is, how can we judge it?

No, wait, we have another question to answer first: can we judge religion? The religious claim to be above the fray, after all, pulling from a divine mandate of some sort that secular people lack. Doesn’t this make them impossible to judge?

I’d be more swayed by this argument if there was only one religion in the world. Instead we find thousands of religions, many of them splintered into various sects. How will you decide which religion to follow, without judging one against the other? If you dodge that by saying you worship all faiths, even though you don’t follow all of their must-follow rules, then I have some bad news:

Strive against the disbelievers and the hypocrites! Be harsh with them. Their ultimate abode is hell, a hapless journey’s end.

(Quoran, verse 9:73)

He that sacrificeth unto any god, save unto Jehovah only, shall be utterly destroyed.

(Old Testament, Exodus 22:20, American Standard translation)

If you worship any religion other than Islam, you will suffer eternally. If you worship any religion other than Judaism,[148] you’ll be killed. Worship both Islam and Judaism, or neither religion, and you’ll have both fates. Ignore one or both of these lines and you’ve placed your moral judgement above god’s, since both of these sources are divinely-inspired words from a god.

Before reading this book, you were forced to make a judgement on religion. Since I presume you’re still alive and in reasonably good health, I think that signals it’s A-OK to judge religion in general.

What criterion should we use for judgement? I’d argue that the best way is through behaviour. All religions tell their adherents how to live a moral, just life. We should expect the religious to live better than their godless counterparts, perhaps by having to deal with less crime or consistently coming up happier in surveys.

In the Pragmatic Argument I consider this, and reject it.

The Ascent of Religion

If you agree with my assessment in Pragmatic, though, we’re left with an unsettling conclusion. If religion is not that useful, why do so many people insist on being religious? Couldn’t that imply we must believe in something, or that we’re being compelled to join religion by something external?

I think I can answer this by sharing my theory of how religion got started in the first place.

I’m not the first to come up with a theory, not by a long shot: for instance, Edward Burnett Tyler had a reasonable one back in 1871. Modern theories tend to fall into a few categories, such as those that invoke evolution:

Perhaps the most basic question is whether the trait is an adaptation that evolved by a process of selection. Does a given element of religion exist because it helps an entity (such as an individual or a group) survive and reproduce better than competing entities? If so, then we need to determine the relevant entity. Does the given element of religion increase the fitness of whole groups, compared to other groups (between-group selection), or by increasing the fitness of individuals compared to other individuals within the same group (within-group selection)? With cultural evolution there is an interesting third possibility. A cultural trait can spread by benefiting whole groups or individuals within groups, but it can also spread by enhancing its own transmission at the expense of human individuals and groups, as if it were a parasitic organism in its own right (Dawkins 2006, Dennett 2006). The concept of religion as a disease is highly novel against the background of traditional religious scholarship.

If a trait it not an adaptation, it can nevertheless persist in the population for a variety of reasons. Perhaps it was adaptive in the past but no longer in the present. For example, our eating habits make excellent sense in a world of food scarcity but have become a major cause of death in modern fast-food environments. Perhaps some elements of religion are like obesity—adaptive in the tiny social groups of our ancestral past, but not in modern mega-societies (Alexander 1987).

Alternatively, a trait can be a non-adaptive byproduct of another trait. An architectural example made famous by Stephen Jay Gould and Richard Lewontin (1979) is a spandrel, the triangular space that inevitably forms when two arches are placed next to each other. Arches have a function but spandrels do not, although they can acquire a secondary function such as a decorative space. As a biological example, moths use celestial light sources to navigate (an adaptation) but this causes them to spiral inward toward earthly light sources such as a streetlamp or flame—a highly destructive byproduct. Perhaps some elements of religion are like a moth to flame (Dawkins 2006).

Finally, a trait can have no effect whatsoever on survival and reproduction and simply drift into the population. Many genetic mutations are selectively neutral, enabling them to be used as a molecular “clock” for measuring the amount of time that species have been genetically isolated from each other. Some elements of religion might similarly have no rhyme or reason, other than the vagaries of chance.

        (“Evolutionary Religious Studies (ERS): A Beginner’s Guide ,” David Sloan Wilson and William Scott Green, draft copy dated September 12th, 2007)

Others point to psychology. Religion could be a cultural system to ease our fears, or a proto-science that satisfied our curiosity and need for explanations before we thought up science proper.

Religion is primarily a search for security and not a search for truth. Religion is what we so often use to bank the fires of our anxiety. That is why religion tends toward becoming excessive, neurotic, controlling and even evil. That is why a religious government is always a cruel government. People need to understand that questioning and doubting are healthy, human activities to be encouraged not to be feared. Certainty is a vice not a virtue. Insecurity is something to be grasped and treasured. A true and healthy religious system will encourage each of these activities. A sick and fearful religious system will seek to remove them.

(“Q&A on biblical criticism,” John Shelby Spong, a weekly mailing dated June 15th, 2005)

The idea that religion is an early form of science is found in many Enlightenment authors, usually with the implication that it has now been replaced by science. Moderate versions of this thesis are found in Auguste Comte and Émile Durkheim. Primarily, however, it was the British anthropologists of religion Edward B. Tylor and James Frazer who defended this view. On the basis of a cognitively oriented associationist psychology, they identified religion with early forms of rational and, especially, scientific thought. For them, religion represented an insufficient answer to cognitive problems such as the explanation of dreams or death. Religion and magic were related in the same way as theory and proctice or science and technology. This tradition is represented today by anthropologists such as Robin Horton, who maintains that “primitive” religion is primarily a rational attempt to interpret the world.

(“The promise of salvation: a theory of religion,” pg 56-57, Martin Riesebrodt and Steven Rendal, 2010)

My own theory is primarily evolutionary, but borrows freely from both branches. Religion likely emerged from five separate elements, two of which are optional.


[146]  I agree. We have other, less destructive ways to define and foster groups. Our tendency to live close to each other and our toolmaking skills can fan small flares into big fires.

[147]  This category lumps people who are “spiritual but not religious” in with atheists and agnostics. If you only consider the latter two to be truly non-religious, then the atheist/agnostic stance becomes the fifth most popular “religion” in the world.

[148]  Christianity includes the Old Testament in its bible. Does this mean Christians would kill Jews for refusing to worship the same god, even though they wrote that rule?

A Trump Controversy, in Tweets

Donald Trump:
Crooked Hillary Clinton and her team “were extremely careless in their handling of very sensitive, highly classified information.” Not fit!

Washington Post:
President Trump’s disclosures jeopardized a critical source of intelligence on the Islamic State, officials said

CBS News:
“Highly damaging”: Ex-CIA deputy director on WaPo report that Pres. Trump revealed classified info to Russians

TheUnsilentMAJORITY:
Think about this… Lavrov & Kislyak given classified info from #Trump bc his need for their approval is stronger than his loyalty to U.S

Matthew Chapman:
Lavrov will share the classified info Trump gave him with the Syrians and the Iranians. Americans fighting in the region are going to die.

Ricky Davila:
Just to be clear, Reuters, NYT, & Buzzfeed have all confirmed the #WaPo‘s report about trump giving highly classified info to the Russians.

Adrian Carrasquillo:
Per @TreyYingst, Bannon, Mike Dubke, Sarah Sanders and Spicer walked into cabinet room just now. They did not look happy.
Can now hear yelling coming from room where officials are.
WH comms staffers just put the TVs on super loud after we could hear yelling coming from room w/ Bannon, Spicer, Sanders

Hayley Byrd:
Dianne Feinstein exits Senate subway and is surrounded by reporters. “Oh my goodness. What’s happened?” (She hasn’t seen the WaPo story.)
Lindsey Graham tells us the WaPo report is “troubling” if true. I ask him if it’s only troubling. “Yeah, because I don’t know if it’s true.”
I wonder how many GOP senators will say they’re troubled before calling for more information.

Thomas Burr‏:
Asked whether @jasoninthehouse still trusts Trump with classified info, Chaffetz says, “Of Course.”

Scott Wong‏:
.@SpeakerRyan spox on WaPo story: “The speaker hopes for a full explanation of the facts from the administration.”

Alice Ollstein:
.@SenatorRisch defends Trump revealing classified info to the Russians: “It’s no longer classified the minute he utters it.”

Yashar:
Hannity right now: “Clinton Email Server Scandal”

Kurt Schlichter‏:
So: HR McMaster, author of Dereliction of Duty, sat back as Trump disgorged critical classified info, then went outside and lied about it?

The Baxter Bean:
Self-serving Republicans ignoring Trump gave highly classified info to foreign adversaries in the WH, but here’s what they said about email

Tony Posnanski:
“He defended Trump when he gave the Russians classified security info!” – The opening line to everyone running against GOP in 2018


Al Weaver:
MCCONNELL react to Wapo story: “We could do with a little less drama from the White House.”
Full quote. [this is worth clicking through, trust me – HJH]

Norah O’Donnell:
“We had lengthy interactions w/ White House all day yesterday. McMaster never said it was false until after it was published” @gregpmiller

Donald Trump:
As President I wanted to share with Russia (at an openly scheduled W.H. meeting) which I have the absolute right to do, facts pertaining….
…to terrorism and airline flight safety. Humanitarian reasons, plus I want Russia to greatly step up their fight against ISIS & terrorism.
I have been asking Director Comey & others, from the beginning of my administration, to find the LEAKERS in the intelligence community…..

P-hacking is No Big Deal?

Possibly not. simine vazire argued the case over at “sometimes i’m wrong.”

The basic idea is as follows: if we use shady statistical techniques to indirectly adjust the p-value cutoff in Null Hypothesis Significance Testing or NHST, we’ll up the rate of false positives we’ll get. Just to put some numbers to this, a p-value cutoff of 0.05 means that when the null hypothesis is true, we’ll get a bad sample about 5% of the time and conclude its true. If we use p-hacking to get an effective cutoff of 0.1, however, then that number jumps up to 10%.

However, p-hacking will also raise the number of true positives we get. How much higher it gets can be tricky to calculate, but this blog post by Erika Salomon gives out some great numbers. During one simulation run, a completely honest test of a false null hypothesis would return a true positive 12% of the time; when p-hacking was introduced, that skyrocketed to 74%.

If the increase in false positives is balanced out by the increase in true positives, then p-hacking makes no difference in the long run. The number of false positives in the literature would be entirely dependent on the power of studies, which is abysmally low, and our focus should be on improving that. Or, if we’re really lucky, the true positives increase faster than the false positives and we actually get a better scientific record via cheating!

We don’t really know which scenario will play out, however, and vazire calls for someone to code up a simulation.

Allow me.

My methodology will be to divide studies up into two categories: null results that are never published, and possibly-true results that are. I’ll be using a one-way ANOVA to check whether the average of two groups drawn from a Gaussian distribution differ. I debated switching to a Student t test, but comparing two random draws seems more realistic than comparing one random draw to a fixed mean of zero.

I need a model of effect and sample sizes. This one is pretty tricky; just because a study is unpublished doesn’t mean the effect size is zero, and vice-versa. Making inferences about unpublished studies is tough, for obvious reasons. I’ll take the naive route here, and assume unpublished studies have an effect size of zero while published studies have effect sizes on the same order of actual published studies. Both published and unpublished will have sample sizes typical of what’s published.

I have a handy cheat for that: the Open Science Collaboration published a giant replication of 100 psychology studies back in 2015, and being Open they shared the raw data online in a spreadsheet. The effect sizes are in correlation coefficients, which are easy to convert to Cohen’s d, and when paired with a standard deviation of one that gives us the mean of the treatment group. The control group’s mean is fixed at zero but shares the same standard deviation. Sample sizes are drawn from said spreadsheet, and represent the total number of samples and not the number of samples per group. In fact, it gives me two datasets in one: the original study effect and sample size, plus the replication’s effect and sample size. Unless I say otherwise, I’ll stick with the originals.

P-hacking can be accomplished a number of ways: switching between the number of tests in the analysis and iteratively doing significance tests are but two of the more common. To simply things I’ll just assume the effective p-value is a fixed number, but explore a range of values to get an idea of how a variable p-hacking effect would behave.

For some initial values, let’s say unpublished studies constitute 70% of all studies, and p-hacking can cause a p-value threshold of 0.05 to act like a threshold of 0.08.

Octave shall be my programming language of choice. Let’s have at it!

(Template: OSC 2015 originals)
With a 30.00% success rate and a straight p <= 0.050000, the false positive rate is 12.3654% (333 f.p, 2360 t.p)
Whereas if p-hacking lets slip p <= 0.080000, the false positive rate is 18.2911% (548 f.p, 2448 t.p)

(Template: OSC 2015 replications)
With a 30.00% success rate and a straight p <= 0.050000, the false positive rate is 19.2810% (354 f.p, 1482 t.p)
Whereas if p-hacking lets slip p <= 0.080000, the false positive rate is 26.2273% (577 f.p, 1623 t.p)

Ouch, our false positive rate went up. That seems strange, especially as the true positives (“t.p.”) and false positives (“f.p.”) went up by about the same amount. Maybe I got lucky with the parameter values, though; let’s scan a range of unpublished study rates from 0% to 100%, and effective p-values from 0.05 to 0.2. The actual p-value rate will remain fixed at 0.05. So we can fit it all in one chart, I’ll take the proportion of p-hacked false positives and subtract it from the vanilla false positives, so that areas where the false positive rate goes down after hacking are negative.

How varying the proportion of unpublished/false studies and the p-hacking amount changes the false positive rate.

There are no values less than zero?! How can that be? The math behind these curves is complex, but I think I can give an intuitive explanation.

Drawing the distribution of p-values when the result is null vs. the results from the OSC originals.The diagonal is the distribution of p-values when the effect size is zero; the curve is what you get when it’s greater than zero. As there are more or less values in each category, the graphs are stretched or squashed horizontally. The p-value threshold is a horizontal line, and everything below that line is statistically significant. The proportion of false to true results is equal to the proportion between the lengths of that horizontal line from the origin.

P-hacking is the equivalent of nudging that line upwards. The proportions change according to the slope of the curve. The steeper it is, the less it changes. It follows that if you want to increase the proportion of true results, you need to find a pair of horizontal lines where the horizontal distance increases as fast or faster in proportion to the increase along that diagonal. Putting this geometrically, imagine drawing a line starting at the origin but at an arbitrary slope. Your job is to find a slope such that the line pierces the non-zero effect curve twice.

Slight problem: that non-zero effect curve has negative curvature everywhere. The slope is guaranteed to get steeper as you step up the curve, which means it will curve up and away from where the line crosses it. Translating that back into math, it’s guaranteed that the non-effect curve will not increase in proportion with the diagonal. The false positive rate will always increase as you up the effective p-value threshold.

And thus, p-hacking is always a deal.

The Cry of the Bigot

Hmph, yet again I find myself late to the party. Shiv has an excellent article up on Jesse Singal.
Back when Singal first started cluelessly meandering into trans issues, virtually every trans feminist academic I read approached him with kiddie gloves. Julia Serano gave an interview with him to help orient his slant on a Ken Zucker piece in relation to empirical evidence–he declined to use any of the information she provided. Same thing with Parker Molloy, who goes to great lengths to avoid calling Singal transphobic despite his omission of Molloy’s attempt to introduce the evidence to him. A blogger by the pseudonym of Cerberus has meticulously documented Singal’s foray into trans issues, and spends several years trying to patiently explain the sheer amount of denialism necessary to maintain the opinions Singal defends.
The chain of causality is a bit convoluted. Rebecca Tuvel wrote a clueless article comparing “transracialism” to gender identity. Some academics popped up to say “you missed the boat, and here’s why.”[2] Singal responded with, in part:
This is a witch hunt. There has simply been an explosive amount of misinformation circulating online about what is and isn’t in Tuvel’s article, which few of her most vociferous critics appear to have even skimmed, based on their inability to accurately describe its contents.
Yeeeah. There’s meatier arguments within Singal’s article, but the histrionics are well out of line. Myers noticed this too, but I want to highlight the hyperbole as a warning flag.
[9:35] HARRIS: The purpose of the podcast was to set the record straight, because I find the dishonesty and hypocrisy and moral cowardice of Murray’s critics shocking, and the fact that I was taken in by this defamation of him and effectively became part of a silent mob that was just watching what amounted to a modern witch-burning, that was intolerable to me. So it is with real pleasure (and some trepidation) that I bring you a very controversial conversation, on points about which there is virtually no scientific controversy. […]

In thinking about the frenzied monstering of me on Freethought Blogs over the past few weeks, I realized I must have been laboring under a misapprehension all the time I was there. I thought it was a network that was partly about thinking – thinking as such, thinking as a value, thinking as a goal and a pursuit and a method. I knew it was about other things too, of course, especially secularism and atheism and also progressive causes, but I did think it put the “thought” part front and center. […]

I think Freethought Blogs the network has taken a hard turn to anti-intellectualism for the sake of absolutist political commitment. I think political commitments need to be accompanied by thinking.

Benson in particular makes a fine example of this, as not only has she endorsed describing any pushback against transphobia as “witch hunts,” she’s also mocked people for playing the “witch hunt” card and hosted a co-blogger who speaks out against actual witch hunts. It’s amazing to watch the ease with which she pulls out hyperbole right to this day, to paint herself as the victim of a vast conspiracy of the blind.

One of the things I loathe most about the “SHUN HER NOW” school of non-thought is the way it forbids all that and insists that thinking has to be replaced with formulas and that the formulas have to be repeated exactly or dire punishment will follow. In short I loathe the banning of thought and probing and questions. I think I knew I couldn’t stay at FTB any longer when the goons started mocking me for daring to say it made a difference whether we were talking about ontology or politics. Fucking hell, if we can’t make distinctions as basic as that how can we think at all?

Back in the day, I pointed out this feeds “into the heightened emotions and paranoia Benson needs to keep other people (and perhaps herself) from looking at the evidence.” It is the cry of the bigot: hyperbolic and emotionally charged, so as to drive out self-reflection and critical thought. Watch for it.

Proof from Design, or the Teleological Proof (6)

I Fought The Law

To some, this is a huge contradiction.

Scientists have been studying heat for as long as we’ve had fire, and from that effort have produced the Laws of Thermodynamics,[139] plus the concepts of entropy and systems. You can think of a system as a really sturdy container, like a pot holding a nice clay sculpture. If the container is closed, you won’t be able to touch the sculpture or tip it out; likewise, a “closed” system is completely isolated from everything else. The analogy doesn’t quite fit; since the container isn’t thermally closed, you can heat up the sculpture by heating up the container. In a proper closed system there’s nothing you can do from the outside to effect the inside, and vice-versa.

This sculpture probably doesn’t take up all of the container, though. The remainder is likely “air,” the random hodge-podge of atmospheric molecules that were floating around inside until the container was closed. This creates a clear boundary, between the air molecules and the clay molecules; we’ll call this a “low” entropy state, which means there is a lot of order present. Suppose we abuse this analogy a little, and kick the container off a cliff. The sculpture would likely shatter, becoming less organized itself. If we repeat this a few million times, you’ll wind up with fine clay dust mingling with the air itself. There’s no longer any sort of boundary or order here, which means it’s now a “high” entropy state. [Read more…]