The Sinmantyx Posts

It started off somewhere around here.

Richard Dawkins: you’re wrong. Deeply, profoundly, fundamentally wrong. Your understanding of feminism is flawed and misinformed, and further, you keep returning to the same poisonous wells of misinformation. It’s like watching creationists try to rebut evolution by citing Kent Hovind; do you not understand that that is not a trustworthy source? It’s a form of motivated reasoning, in which you keep returning to those who provide the comfortable reassurances that your biases are actually correct, rather than challenging yourself with new perspectives.

Just for your information, Christina Hoff Sommers is an anti-feminist. She’s spent her entire career inventing false distinctions and spinning fairy tales about feminism.

In the span of a month, big names in the atheo-skeptic community like Dawkins, Sam Harris, and DJ Groethe lined up to endorse Christina Hoff Sommers as a feminist. At about the same time, Ayaan Hirsi Ali declared “We must reclaim and retake feminism from our fellow idiotic women,” and the same people cheered her on. Acquaintances of mine who should have known better defended Sommers and Ali, and I found myself arguing against brick walls. Enraged that I was surrounded by the blind, I did what I always do in these situations.

I researched. I wrote.

The results were modest and never widely circulated, but it caught the eye of M.A. Melby. She offered me a guest post at her blog, and I promised to append more to what I had written. And append I did.

After that was said and done, Melby left me a set of keys and said I could get comfortable. I was officially a co-blogger. I started pumping out blog posts, and never really looked back. Well, almost; out of all that I wrote over at Sinmantyx, that first Christina Hoff Sommers piece has consistently been the most popular.
I’ll do the same thing here as with my Sinmantyx statistics posts, keep the originals intact and in-place and create an archive over here.

The Sinmantyx Statistic Posts

Some of my fondest childhood memories were of reading Discover Magazine and National Geographic in my grandfather’s basement. He more than anyone cultivated my interest in science, and having an encyclopedia for a dad didn’t hurt either. This led to a casual interest in statistics, which popped up time and again as the bedrock of science.

Jumping ahead a few years, writing Proof of God led me towards the field of epistemology, or how we know what we know. This fit neatly next to my love of algorithms and computers, and I spent many a fun afternoon trying to assess and break down knowledge systems. I forget exactly how I was introduced to Bayesian statistics; I suspect I may have stumbled across a few articles by chance, but it’s also possible Richard Carrier’s cheerleading was my first introduction. Either way, I began studying the subject with gusto.

By the time I’d started blogging over at Sinmantyx, I had a little experience with the subject and I was dying to flex it. And so Bayesian statistics became a major theme of my blog posts, to the point that I think it deserves its own section.

Speaking of which, I’ve decided to post-date any and all Sinmantyx posts that I re-post over here. There was never any real “publication date” for Proof of God, as it was never published and I constantly went back and revised it over the years I spent writing it, so I feel free to assign any date I want to them. The opposite is true of my Sinmantyx work, and so I’ll defer to their original publication date. This does create a problem in finding these posts, as more than likely they’ll never make the RSS feed. Not to worry: I’ll use this blog post to catalog them, so just bookmark this or look for it along my blog header.

[Read more…]

Replication Isn’t Enough

I bang on about statistical power because it indirectly raises the odds of a false positive. In brief, it forces you to do more tests to reach a statistical conclusion, stuffing the file drawer and thus making published results appear more certain than they are. In detail, see John Borghi or Ioannidis (2005). In comic, see Maki Naro.

The concept of statistical power has been known since 1928, the wasteful consequences of low power since 1962, and yet there’s no sign that scientists are upping their power levels. This is a representative result:

Our results indicate that the average statistical power of studies in the field of neuroscience is probably no more than between ~8% and ~31%, on the basis of evidence from diverse subfields within neuro-science. If the low average power we observed across these studies is typical of the neuroscience literature as a whole, this has profound implications for the field. A major implication is that the likelihood that any nominally significant finding actually reflects a true effect is small.

Button, Katherine S., et al. “Power failure: why small sample size undermines the reliability of neuroscience.” Nature Reviews Neuroscience 14.5 (2013): 365-376.

The most obvious consequence of low power is a failure to replicate. If you rarely try to replicate studies, you’ll be blissfully unaware of the problem; once you take replications seriously, though, you’ll suddenly find yourself in a “replication crisis.”

You’d think this would result in calls for increased statistical power, with the occasional call for a switch in methodology to a system that automatically incorporates power. But it’s also led to calls for more replications.

As a condition of receiving their PhD from any accredited institution, graduate students in psychology should be required to conduct, write up, and submit for publication a high-quality replication attempt of at least one key finding from the literature, focusing on the area of their doctoral research.
Everett, Jim AC, and Brian D. Earp. “A tragedy of the (academic) commons: interpreting the replication crisis in psychology as a social dilemma for early-career researchers.” Frontiers in psychology 6 (2015).


Much has been made of preregistration, publication of null results, and Bayesian statistics as important changes to how we do business. But my view is that there is relatively little value in appending these modifications to a scientific practice that is still about one-off findings; and applying them mechanistically to a more careful, cumulative practice is likely to be more of a hindrance than a help. So what do we do? …

Cumulative study sets with internal replication.

If I had to advocate for a single change to practice, this would be it.

There’s an intuitive logic to this: currently less than one in a hundred papers are replications of prior work, so there’s plenty of room for expansion; many key figures like Ronald Fisher and Jerzy Neyman have emphasized the necessity of replications; and it doesn’t require any modification of technique; and the “replication crisis” is primarily about replications. It sounds like an easy, feel-good solution to the problem.

But then I read this paper:

Smaldino, Paul E., and Richard McElreath. “The Natural Selection of Bad Science.” arXiv preprint arXiv:1605.09511 (2016).

It starts off with a meta-analysis of meta-analyses of power, and comes to the same conclusion as above.

We collected all papers that contained reviews of statistical power from published papers in the social, behavioural and biological sciences, and found 19 studies from 16 papers published between 1992 and 2014. … We focus on the statistical power to detect small effects of the order d=0.2, the kind most commonly found in social science research. …. Statistical power is quite low, with a mean of only 0.24, meaning that tests will fail to detect small effects when present three times out of four. More importantly, statistical power shows no sign of increase over six decades …. The data are far from a complete picture of any given field or of the social and behavioural sciences more generally, but they help explain why false discoveries appear to be common. Indeed, our methods may overestimate statistical power because we draw only on published results, which were by necessity sufficiently powered to pass through peer review, usually by detecting a non-null effect.

Rather than leave it at that, though, the researchers decided to simulate the pursuit of science. They set up various “labs” that exerted different levels of effort to maintain methodological rigor, killed off labs that didn’t publish much and replaced them with mutations of labs that published more, and set the simulation spinning.

We ran simulations in which power was held constant but in which effort could evolve (μw=0, μe=0.01). Here selection favoured labs who put in less effort towards ensuring quality work, which increased publication rates at the cost of more false discoveries … . When the focus is on the production of novel results and negative findings are difficult to publish, institutional incentives for publication quantity select for the continued degradation of scientific practices.

That’s not surprising. But then they started tinkering with replication rates. To begin with, replications were done 1% of the time, were guaranteed to be published, and having one of your results fail to replicate would exact a terrible toll.

We found that the mean rate of replication evolved slowly but steadily to around 0.08. Replication was weakly selected for, because although publication of a replication was worth only half as much as publication of a novel result, it was also guaranteed to be published. On the other hand, allowing replication to evolve could not stave off the evolution of low effort, because low effort increased the false-positive rate to such high levels that novel hypotheses became more likely than not to yield positive results … . As such, increasing one’s replication rate became less lucrative than reducing effort and pursuing novel hypotheses.

So it was time for extreme measures: force the replication rate to high levels, to the point that 50% of all studies were replications. All that happened was that it took longer for the overall methodological effort to drop and false positives to bloom.

Replication is not sufficient to curb the natural selection of bad science because the top performing labs will always be those who are able to cut corners. Replication allows those labs with poor methods to be penalized, but unless all published studies are replicated several times (an ideal but implausible scenario), some labs will avoid being caught. In a system such as modern science, with finite career opportunities and high network connectivity, the marginal return for being in the top tier of publications may be orders of magnitude higher than an otherwise respectable publication record.

Replication isn’t enough. The field of science needs to incorporate more radical reforms that encourage high methodological rigor and greater power.

Veritasium on the Reproducibility Crisis

It’s a great summary, going into much more depth than most. I really like how Muller brought out a concrete example of publication bias, and found an example of p-hacking in a branch of science that’s usually resistant to it, physics.

But I’m not completely happy with it. Some of this comes from being a Bayesian fanboi that didn’t hear the topic mentioned, but Muller also makes a weird turn of phrase at the end. Muller argues that, as bad as the flaws in science may be, think of how much worse they are in all our other systems of learning about the world.

Slight problem: there are no other systems. Even “I feel it’s true” is based on an evidential claim, evaluated for plausibility against other competing hypotheses. The weighting procedure may be hopelessly skewed, but so too are p-values and the publication process.

Muller could have strengthened his point by bringing up an example, yet did not. We’re left taking his word that science isn’t the sole methodology we have for exploring the world, and that those alternate methodologies aren’t as rigorous. Meanwhile, he explicitly points out that a small fraction of “landmark cancer trials” could be replicated; this implies that cancer treatments, and by extension the well-being of millions of cancer patients, are being harmed by poor methodology in science. Even if you disagree with my assertion that all epistemologies are scientific in some fashion, it’s tough to find a counter-example that effects 40% of us and will kill a quarter.

My hope doesn’t come from a blind assurance that other methodologies are worse than science, it comes from the news that scientists have recognized the flaws in their trade, and are working to correct them. To be fair to Muller, he’d probably agree.

What is False?

John Oliver weighed in on the replication crisis, and I think he did a great job. I’d have liked a bit more on university press departments, who can write misleading press releases that journalists jump on, but he did have to simplify things for a lay audience.

It got me thinking about what “false” means, though. “True” is usually defined as “in line with reality,” so “false” should mean “not in line with reality,” the precise compliment.

But don’t think about it in terms of a single thing, but in multiple data points applied to a specific theory. Suppose we analyze that data, and find that all but a few datapoints are predicted by the hypothesis we’re testing. Does this mean the hypothesis is false, since it isn’t in line with reality in all cases, or true, because it’s more in line with reality than not? Falsification argues that it is false, and exploits that to come up with this epistemology:

  1. Gather data.
  2. Is that data predicted by the hypothesis? If so, repeat step 1.
  3. If not, replace this hypothesis with another that predicts all the data we’ve seen so far, and repeat step 1.

That’s what I had in mind when I said that frequentism works on streams of hypotheses, hopping from one “best” hypothesis to the next. The addition of time changes the original definitions slightly, so that “true” really means “in line with reality in all instances” while “false” means “in at least one instance, it is not in line with reality.”

Notice the asymmetry, though. A hypothesis has to reach a pretty high bar to be considered “true,” and “false” hypotheses range from “in line with reality, with one exception” to “never in line with reality.” Some of those “false” hypotheses are actually quite valuable to us, as John Oliver’s segment demonstrates. He never explains what “statistical significance” means, for instance, but later on uses “significance” in the “effect size” sense. This will mislead most of the audience away from the reality of the situation, and in the absolute it makes his segment “false.” Nonetheless, that segment was a net positive at getting people to understand and care for the replication crisis, so labeling it “false” is a disservice.

We need something fuzzier than the strict binary of falsification. What if we didn’t compliment “true” in the set-theory sense, but in the definitional sense? Let “true” remain “in line with reality in all instances,” but change “false” from “in at least one instance, it is not in reality” to “never in line with reality.” This creates a gap, though: that hypothesis from earlier is neither “true” nor “false,” as it isn’t true in all cases nor false in all. It must be in a third category, as part of some sort of paraconsistent logic.

This is where the Bayesian interpretation of statistics comes from, it deliberately disclaims an absolute “true” or “false” label for descriptions of the world, instead holding them up as two ends of a continuum. Every hypothesis in the third category inbetween, hoping that future data will reveal that its closer to one end of the continuum or the other.

I think it’s a neat way to view the Bayesian/Frequentism debate, as a mere disagreement over what “false” means.

Index Post: P-values

Over the months, I’ve managed to accumulate a LOT of papers discussing p-values and their application. Rather than have them rot on my hard drive, I figured it was time for another index post.

Full disclosure: I’m not in favour of them. But I came to that by reading these papers, and seeing no effective counter-argument. So while this collection is biased against p-values, that’s no more a problem than a bias against the luminiferous aether or humour theory. And don’t worry, I’ll include a few defenders of p-values as well.

What’s a p-value?

It’s frequently used in “null hypothesis significance testing,” or NHST to its friends. A null hypothesis is one you hope to refute, preferably a fairly established one that other people accept as true. That hypothesis will predict a range of observations, some more likely than others. A p-value is simply the odds of some observed event happening, plus the odds of all events more extreme, assuming the null hypothesis is true. You can then plug that value into the following logic:

  1. Event E, or an event more extreme, is unlikely to occur under the null hypothesis.
  2. Event E occurred.
  3. Ergo, the null hypothesis is false.

They seem like a weird thing to get worked up about.

Significance testing is a cornerstone of modern science, and NHST is the most common form of it. A quick check of Google Scholar shows “p-value” shows up 3.8 million times, while its primary competitor, “Bayes Factor,” shows up 250,000. At the same time, it’s poorly understood.

The P value is probably the most ubiquitous and at the same time, misunderstood, misinterpreted, and occasionally miscalculated index in all of biomedical research. In a recent survey of medical residents published in JAMA, 88% expressed fair to complete confidence in interpreting P values, yet only 62% of these could answer an elementary P-value interpretation question correctly. However, it is not just those statistics that testify to the difficulty in interpreting P values. In an exquisite irony, none of the answers offered for the P-value question was correct, as is explained later in this chapter.

Goodman, Steven. “A Dirty Dozen: Twelve P-Value Misconceptions.” In Seminars in Hematology, 45:135–40. Elsevier, 2008. http://www.sciencedirect.com/science/article/pii/S0037196308000620.

The consequence is an abundance of false positives in the scientific literature, leading to many failed replications and wasted resources.

Gotcha. So what do scientists think is wrong with them?

Well, th-

And make it quick, I don’t have a lot of time.

Right right, here’s the top three papers I can recommend:

Null hypothesis significance testing (NHST) is arguably the mosl widely used approach to hypothesis evaluation among behavioral and social scientists. It is also very controversial. A major concern expressed by critics is that such testing is misunderstood by many of those who use it. Several other objections to its use have also been raised. In this article the author reviews and comments on the claimed misunderstandings as well as on other criticisms of the approach, and he notes arguments that have been advanced in support of NHST. Alternatives and supplements to NHST are considered, as are several related recommendations regarding the interpretation of experimental data. The concluding opinion is that NHST is easily misunderstood and misused but that when applied with good judgment it can be an effective aid to the interpretation of experimental data.

Nickerson, Raymond S. “Null Hypothesis Significance Testing: A Review of an Old and Continuing Controversy.” Psychological Methods 5, no. 2 (2000): 241.

After 4 decades of severe criticism, the ritual of null hypothesis significance testing (mechanical dichotomous decisions around a sacred .05 criterion) still persists. This article reviews the problems with this practice, including near universal misinterpretation of p as the probability that H₀ is false, the misinterpretation that its complement is the probability of successful replication, and the mistaken assumption that if one rejects H₀ one thereby affirms the theory that led to the test.

Cohen, Jacob. “The Earth Is Round (p < .05).” American Psychologist 49, no. 12 (1994): 997–1003. doi:10.1037/0003-066X.49.12.997.

This chapter examines eight of the most commonly voiced objections to reform of data analysis practices and shows each of them to be erroneous. The objections are: (a) Without significance tests we would not know whether a finding is real or just due to chance; (b) hypothesis testing would not be possible without significance tests; (c) the problem is not significance tests but failure to develop a tradition of replicating studies; (d) when studies have a large number of relationships, we need significance tests to identify those that are real and not just due to chance; (e) confidence intervals are themselves significance tests; (f) significance testing ensure objectivity in the interpretation of research data; (g) it is the misuse, not the use, of significance testing that is the problem; and (h) it is futile to reform data analysis methods, so why try?

Schmidt, Frank L., and J. E. Hunter. “Eight Common but False Objections to the Discontinuation of Significance Testing in the Analysis of Research Data.” What If There Were No Significance Tests, 1997, 37–64.

OK, I have a bit more time now. What else do you have?

Using a Bayesian significance test for a normal mean, James Berger and Thomas Sellke (1987, pp. 112–113) showed that for p values of .05, .01, and .001, respectively, the posterior probabilities of the null, Pr(H₀ | x), for n = 50 are .52, .22, and .034. For n = 100 the corresponding figures are .60, .27, and .045. Clearly these discrepancies between p and Pr(H₀ | x) are pronounced, and cast serious doubt on the use of p values as reasonable measures of evidence. In fact, Berger and Sellke (1987) demonstrated that data yielding a p value of .05 in testing a normal mean nevertheless resulted in a posterior probability of the null hypothesis of at least .30 for any objective (symmetric priors with equal prior weight given to H₀ and HA ) prior distribution.

Hubbard, R., and R. M. Lindsay. “Why P Values Are Not a Useful Measure of Evidence in Statistical Significance Testing.” Theory & Psychology 18, no. 1 (February 1, 2008): 69–88. doi:10.1177/0959354307086923.

Because p-values dominate statistical analysis in psychology, it is important to ask what p says about replication. The answer to this question is ‘‘Surprisingly little.’’ In one simulation of 25 repetitions of a typical experiment, p varied from .44. Remarkably, the interval—termed a p interval —is this wide however large the sample size. p is so unreliable and gives such dramatically vague information that it is a poor basis for inference.

Cumming, Geoff. “Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better.Perspectives on Psychological Science 3, no. 4 (July 2008): 286–300. doi:10.1111/j.1745-6924.2008.00079.x.

Simulations of repeated t-tests also illustrate the tendency of small samples to exaggerate effects. This can be shown by adding an additional dimension to the presentation of the data. It is clear how small samples are less likely to be sufficiently representative of the two tested populations to genuinely reflect the small but real difference between them. Those samples that are less representative may, by chance, result in a low P value. When a test has low power, a low P value will occur only when the sample drawn is relatively extreme. Drawing such a sample is unlikely, and such extreme values give an exaggerated impression of the difference between the original populations. This phenomenon, known as the ‘winner’s curse’, has been emphasized by others. If statistical power is augmented by taking more observations, the estimate of the difference between the populations becomes closer to, and centered on, the theoretical value of the effect size.

is G., Douglas Curran-Everett, Sarah L. Vowler, and Gordon B. Drummond. “The Fickle P Value Generates Irreproducible Results.” Nature Methods 12, no. 3 (March 2015): 179–85. doi:10.1038/nmeth.3288.

If you use p=0.05 to suggest that you have made a discovery, you will be wrong at least 30% of the time. If, as is often the case, experiments are underpowered, you will be wrong most of the time. This conclusion is demonstrated from several points of view. First, tree diagrams which show the close analogy with the screening test problem. Similar conclusions are drawn by repeated simulations of t-tests. These mimic what is done in real life, which makes the results more persuasive. The simulation method is used also to evaluate the extent to which effect sizes are over-estimated, especially in underpowered experiments. A script is supplied to allow the reader to do simulations themselves, with numbers appropriate for their own work. It is concluded that if you wish to keep your false discovery rate below 5%, you need to use a three-sigma rule, or to insist on p≤0.001. And never use the word ‘significant’.

Colquhoun, David. “An Investigation of the False Discovery Rate and the Misinterpretation of P-Values.” Royal Society Open Science 1, no. 3 (November 1, 2014): 140216. doi:10.1098/rsos.140216.

I was hoping for something more philosophical.

The idea that the P value can play both of these roles is based on a fallacy: that an event can be viewed simultaneously both from a long-run and a short-run perspective. In the long-run perspective, which is error-based and deductive, we group the observed result together with other outcomes that might have occurred in hypothetical repetitions of the experiment. In the “short run” perspective, which is evidential and inductive, we try to evaluate the meaning of the observed result from a single experiment. If we could combine these perspectives, it would mean that inductive ends (drawing scientific conclusions) could be served with purely deductive methods (objective probability calculations).

Goodman, Steven N. “Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy.” Annals of Internal Medicine 130, no. 12 (1999): 995–1004.

Overemphasis on hypothesis testing–and the use of P values to dichotomise significant or non-significant results–has detracted from more useful approaches to interpreting study results, such as estimation and confidence intervals. In medical studies investigators are usually interested in determining the size of difference of a measured outcome between groups, rather than a simple indication of whether or not it is statistically significant. Confidence intervals present a range of values, on the basis of the sample data, in which the population value for such a difference may lie. Some methods of calculating confidence intervals for means and differences between means are given, with similar information for proportions. The paper also gives suggestions for graphical display. Confidence intervals, if appropriate to the type of study, should be used for major findings in both the main text of a paper and its abstract.

Gardner, Martin J., and Douglas G. Altman. “Confidence Intervals rather than P Values: Estimation rather than Hypothesis Testing.” BMJ 292, no. 6522 (1986): 746–50.

What’s this “Neyman-Pearson” thing?

P-values were part of a method proposed by Ronald Fisher, as a means of assessing evidence. Even as the ink was barely dry on it, other people started poking holes in his work. Jerzy Neyman and Egon Pearson took some of Fisher’s ideas and came up with a new method, based on long-term prediction. Their method is superior, IMO, but rather than replacing Fisher’s approach it instead wound up being blended with it, ditching all the advantages to preserve the faults. This citation covers the historical background:

Huberty, Carl J. “Historical Origins of Statistical Testing Practices: The Treatment of Fisher versus Neyman-Pearson Views in Textbooks.” The Journal of Experimental Education 61, no. 4 (1993): 317–33.

While the remainder help describe the differences between the two methods, and possible ways to “fix” their shortcomings.

The distinction between evidence (p’s) and error (a’s) is not trivial. Instead, it reflects the fundamental differences between Fisher’s ideas on significance testing and inductive inference, and Neyman-Pearson’s views on hypothesis testing and inductive behavior. The emphasis of the article is to expose this incompatibility, but we also briefly note a possible reconciliation.

Hubbard, Raymond, and M. J Bayarri. “Confusion Over Measures of Evidence ( p ’S) Versus Errors ( α ’S) in Classical Statistical Testing.” The American Statistician 57, no. 3 (August 2003): 171–78. doi:10.1198/0003130031856.

The basic differences are these: Fisher attached an epistemic interpretation to a significant result, which referred to a particular experiment. Neyman rejected this view as inconsistent and attached a behavioral meaning to a significant result that did not refer to a particular experiment, but to repeated experiments. (Pearson found himself somewhere in between.)

Gigerenzer, Gerd. “The Superego, the Ego, and the Id in Statistical Reasoning.” A Handbook for Data Analysis in the Behavioral Sciences: Methodological Issues, 1993, 311–39.

This article presents a simple example designed to clarify many of the issues in these controversies. Along the way many of the fundamental ideas of testing from all three perspectives are illustrated. The conclusion is that Fisherian testing is not a competitor to Neyman-Pearson (NP) or Bayesian testing because it examines a different problem. As with Berger and Wolpert (1984), I conclude that Bayesian testing is preferable to NP testing as a procedure for deciding between alternative hypotheses.

Christensen, Ronald. “Testing Fisher, Neyman, Pearson, and Bayes.” The American Statistician 59, no. 2 (2005): 121–26.

C’mon, there aren’t any people defending the p-value?

Sure there are. They fall into two camps: “deniers,” a small group that insists there’s nothing wrong with p-values, and the much more common “fixers,” who propose making up for the shortcomings by augmenting NHST. Since a number of fixers have already been cited, I’ll just focus on the deniers here.

On the other hand, the propensity to misuse or misunderstand a tool should not necessarily lead us to prohibit its use. The theory of estimation is also often misunderstood. How many epidemiologists can explain the meaning of their 95% confidence interval? There are other simple concepts susceptible to fuzzy thinking. I once quizzed a class of epidemiology students and discovered that most had only a foggy notion of what is meant by the word “bias.” Should we then abandon all discussion of bias, and dumb down the field to the point where no subtleties need trouble us?

Weinberg, Clarice R. “It’s Time to Rehabilitate the P-Value.” Epidemiology 12, no. 3 (2001): 288–90.

The solution is simple and practiced quietly by many researchers—use P values descriptively, as one of many considerations to assess the meaning and value of epidemiologic research findings. We consider the full range of information provided by P values, from 0 to 1, recognizing that 0.04 and 0.06 are essentially the same, but that 0.20 and 0.80 are not. There are no discontinuities in the evidence at 0.05 or 0.01 or 0.001 and no good reason to dichotomize a continuous measure. We recognize that in the majority of reasonably large observational studies, systematic biases are of greater concern than random error as the leading obstacle to causal interpretation.

Savitz, David A. “Commentary: Reconciling Theory and Practice.” Epidemiology 24, no. 2 (March 2013): 212–14. doi:10.1097/EDE.0b013e318281e856.

The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real.

Hunter, John E. “Testing Significance Testing: A Flawed Defense.” Behavioral and Brain Sciences 21, no. 02 (April 1998): 204–204. doi:10.1017/S0140525X98331167.

Index Post: Rape Myth Acceptance

Apologies for going silent, but I’ve been in crunch mode over a lecture on rape culture. The crush is over, thankfully, and said lecture has been released in video, transcript, and footnote form.

But one strange thing about it is that I never go into depth on the rape myth acceptance literature. There’s actually a good reason why: after thirty years of research, modern papers don’t even bother with 101 level stuff like “why is this a myth?” or even “how many people believe myth X?”, because it’s been done and covered and consensus has been reached. My intended audience was below the 101 level and hostile to the very notion of “rape culture,” rendering much of the literature useless.

But there is soooooo much literature that it feels like a grave injustice not to talk about it. So, let’s try something special: this will be an index post to said literature. It’ll give you the bare minimum of preamble you need to jump in, and offer a little curation. This will evolve and change over time, too, so check back periodically.

[section on comment policy deleted, for obvious reasons]

What is a “Rape Myth”?

A “rape myth” is pretty self-explanatory: it is a false belief about sexual assault, typically shared by more than one person. Martha Burt’s foundational paper of 1980 includes these, for instance:

“One reason that women falsely report a rape is that they frequently have a need to call attention to themselves.”
“Any healthy woman can successfully resist a rapist if she really wants to.”
“Many women have an unconscious wish to be raped, and may then unconsciously set up a situation in which they are likely to be attacked.”
“If a woman gets drunk at a party and has intercourse with a man she’s just met there, she should be considered “fair game” to other males at the party who want to have sex with her too, whether she wants to or not.”

Other myths include “men cannot be raped” and “if you orgasm, it can’t be rape” (we’re meat machines, and at some point low-level physiology will override high-level cognition).

What papers should I prioritize?

As mentioned, there’s Burt’s 1980 contribution, which goes into great detail about validity and correlations with environmental factors, and developed a questionnaire that became foundational for the field.

The present research, therefore, constitutes a first effort to provide an empirical foundation for a combination of social psychological and feminist theoretical analysis of rape attitudes and their antecedents.

The results reported here have two major implications. First, many Americans do indeed believe many rape myths. Second, their rape attitudes are strongly connected to other deeply held and pervasive attitudes such as sex role stereotyping, distrust of the opposite sex (adversarial sexual beliefs), and acceptance of interpersonal violence. When over half of the sampled individuals agree with statements such as “A women who goes to the home or apartment of a man on the first date implies she is willing to have sex” and “In the majority of rapes, the victim was promiscuous or had a bad reputation,” and when the same number think that 50% or more of reported rapes are reported as rape only because the woman was trying to get back at a man she was angry with or was trying to cover up an illegitimate pregnancy, the world is indeed not a safe place for rape victims.
Burt, Martha R. “Cultural Myths and Supports for Rape.” Journal of Personality and Social Psychology 38, no. 2 (1980): 217.
http://www.excellenceforchildandyouth.ca/sites/default/files/meas_attach/burt_1980.pdf

But there’s also the Illinois Rape Myth Acceptance Scale, developed twenty years later and benefiting greatly from that.

First, we set out to systematically elucidate the domain and structure of the rape myth construct through reviewing the pertinent literature, discussion with experts, and empirical investigation. Second, we developed two scales, the 45-item IRMA and its 20-item short form (IRMA-SF), designed to reflect the articulated domain and structure of the rape myth construct, as well as to possess good psychometric properties. Finally, whereas content validity was determined by scale development procedures, construct validity of the IRMA and IRMA-SF was examined in a series of three studies, all using different samples, methodologies, and analytic strategies. […]

This work revealed seven stable and interpretable components of rape myth acceptance labeled (1) She asked for it; (2) It wasn’t really rape; (3) He didn’t mean to; (4) She wanted it; (5) She lied; (6) Rape is a trivial event; and (7) Rape is a deviant event. […]

individuals with higher scores on the IRMA and IRMA-SF were also more likely to (1) hold more traditional sex role stereotypes, (2) endorse the notion that the relation of the sexes is adversarial in nature, (3) express hostile attitudes toward women, and (4) be relatively accepting of both interpersonal violence and violence more generally.
Payne, Diana L., Kimberly A. Lonsway, and Louise F. Fitzgerald. “Rape Myth Acceptance: Exploration of Its Structure and Its Measurement Using theIllinois Rape Myth Acceptance Scale.” Journal of Research in Personality 33, no. 1 (March 1999): 27–68. doi:10.1006/jrpe.1998.2238.

What else is interesting?

There was marked variability (…) among studies in their reported relationships between RMA and attitudinal factors related with gender and sexuality (…). Not surprisingly, however, large overall effect sizes with a positive direction were found with oppressive and adversarial attitudes against women, such as attitudes toward women (…), combined measures of sexism (…), victim-blaming attitudes (…), acceptance of interpersonal violence (…), low feminist identity (…), and adversarial sexual beliefs (…). Decision latency (i.e., estimated time for a woman to say no to sexual advances), hostility toward women, male sexuality, prostitution myth, therapists’ acceptance of rape victim scale, sexual conservatism, vengeance, and sociosexuality (i.e., openness to multiple sexual partners) were examined in one study each, and their effect sizes ranged between medium to large and were all significantly larger than zero. Homophobia had a significant moderate effect size (…) as well as male-dominance attitude (…), acceptance of rape (…), and violence (…). However, profeminist beliefs (…), having sexual submission fantasies (…), and male hostility (…) were negatively related to RMA.
Suarez, E., and T. M. Gadalla. “Stop Blaming the Victim: A Meta-Analysis on Rape Myths.” Journal of Interpersonal Violence 25, no. 11 (November 1, 2010): 2010–35. doi:10.1177/0886260509354503.
http://474miranairresearchpaper.wmwikis.net/file/view/metaanalysisstopblamingvictim.pdf

Results of a multiple regression analysis indicated that sexism, ageism, classism, and religious intolerance each were significant predictors of rape myth acceptance (all
p < 0.01; … ). Racism and homophobia, however, failed to enter the model. Sexism, ageism, classism, and religious intolerance accounted for almost one-half (45%) of the variance in rape myth acceptance for the present sample. Sexism accounted for the greatest proportion of the variance (35%). The other intolerant beliefs accounted for relatively smaller amounts of variance beyond that of sexism: classism (2%), ageism (2%), and religious intolerance (1%).
Aosved, Allison C., and Patricia J. Long. “Co-Occurrence of Rape Myth Acceptance, Sexism, Racism, Homophobia, Ageism, Classism, and Religious Intolerance.” Sex Roles 55, no. 7–8 (November 28, 2006): 481–92. doi:10.1007/s11199-006-9101-4.
http://www.researchgate.net/publication/226582617_Co-occurrence_of_Rape_Myth_Acceptance_Sexism_Racism_Homophobia_Ageism_Classism_and_Religious_Intolerance/file/72e7e52bd021d8bc72.pdf

We did not find any effect of participant’s gender on rape attributions. Our results confirm those obtained by other authors (Check & Malamuth, 1983; Johnson & Russ, 1989; Krahe, 1988) who haven’t found significant gender effects on rape perception when situational factors were manipulated. Our results also contradict the general finding that men hold more rape myths than women do (Anderson et al., 1997). Our data indicate that it is not the observer’s gender that determines rape attributions but his or her preconceptions about rape. Thus, the influence of gender on rape attributions might be mediated by RMA, which then might explain why some studies reveal a significant gender effect (Monson et al., 1996; Stormo et al., 1997).
Frese, Bettina, Miguel Moya, and Jesús L. Megías. “Social Perception of Rape How Rape Myth Acceptance Modulates the Influence of Situational Factors.” Journal of Interpersonal Violence 19, no. 2 (February 1, 2004): 143–61. doi:10.1177/0886260503260245.
http://www.d.umn.edu/cla/faculty/jhamlin/3925/4925HomeComputer/Rape%20myths/Social%20Perception.pdf

The current research further corroborates the role of rape myths as a factor facilitating sexual aggression. Taken together, our findings suggest that salient ingroup norms may be important determinants of the professed willingness to engage in sexually aggressive behavior. Our studies go beyond quasi-experimental and correlational work that had shown a close relationship between RMA and rape proclivity [RP] as well as our own previous experimental studies, which have shown individuals’ RMA to causally affect RP. They demonstrate that salient information about others’ RMA may cause differences in men’s self-reported proclivity to exert sexual violence.
Frese, Bettina, Miguel Moya, and Jesús L. Megías. “Social Perception of Rape How Rape Myth Acceptance Modulates the Influence of Situational Factors.” Journal of Interpersonal Violence 19, no. 2 (February 1, 2004): 143–61. doi:10.1177/0886260503260245.
http://www.d.umn.edu/cla/faculty/jhamlin/3925/4925HomeComputer/Rape%20myths/Social%20Norms.pdf

Rape myth acceptance and time of initial resistance appeared to be determining factors in the assignment of blame and perception of avoid-ability of a sexual assault for both men and women. Consistent with the literature, women in this study obtained a lower mean rape myth acceptance score than men. As hypothesized, men and women with low rape myth acceptance attributed significantly less blame to the victim and situation, more blame to the perpetrator, and were less likely to believe the assault could have been avoided. Likewise, when time of initial resistance occurred early in the encounter, men and women attributed significantly less blame to the victim and situation, more blame to the perpetrator, and were less likely to believe the sexual assault could have been avoided.

The hypothesis that traditional gender-role types (masculine and feminine) would be more likely to blame the victim following an acquaintance rape than nontraditional gender-role types (androgynous and undifferentiated) was unsupported.
Kopper, Beverly A. “Gender, Gender Identity, Rape Myth Acceptance, and Time of Initial Resistance on the Perception of Acquaintance Rape Blame and Avoidability.” Sex Roles 34, no. 1–2 (January 1, 1996): 81–93. doi:10.1007/BF01544797.
http://www.researchgate.net/profile/Allison_Aosved/publication/226582617_Co-occurrence_of_Rape_Myth_Acceptance_Sexism_Racism_Homophobia_Ageism_Classism_and_Religious_Intolerance/links/02e7e52bd021d8bc72000000.pdf

Given that callous sexual attitudes permit violence and consider women as passive sexual objects, it follows that for men who endorse these, sexual aggression becomes an appropriate and accepted expression of masculinity. In this sense, using force to obtain intercourse does not become an act of
rape, but rather an expression of hypermasculinity, which may be thought of as a desirable disposition in certain subcultures. Taken together, these research findings suggest that an expression of hypermasculinity through callous sexual attitudes may relate to an inclination to endorse a behavioral description
(i.e., using force to hold an individual down) versus referring to a sexually aggressive act as rape. Hence, we hypothesize that the construct of callous sexual attitudes will be found at the highest levels in those men who endorse intentions to force a woman to sexual acts but deny intentions to rape.Edwards, Sarah R., Kathryn A. Bradshaw, and Verlin B. Hinsz. “Denying Rape but Endorsing Forceful Intercourse: Exploring Differences among Responders.” Violence and Gender 1, no. 4 (2014): 188–93.

The majority of participants were classified as either sexually coercive (51.4%) or sexually aggressive (19.7%) based on the most severe form of sexual perpetration self-reported on the SEQ or indicated in criminal history information obtained from institutional files. Approximately one third (33.5%) of coercers and three fourths (76%) of aggressors endorsed the use of two or more tactics for obtaining unwanted sexual contact on the SEQ. Although 63.4% of sexually aggressive men were classified based on their self-reported behavior on the SEQ alone, another 31% were classified on the basis of criminal history information indicating a prior sexual offense conviction involving an adult female, or on the agreement of both sources (5.6%). Notably, 90.1% of sexually aggressive men also reported engaging in lower level sexually coercive behaviors.DeGue, S., D. DiLillo, and M. Scalora. “Are All Perpetrators Alike? Comparing Risk Factors for Sexual Coercion and Aggression.” Sexual Abuse: A Journal of Research and Treatment 22, no. 4 (December 1, 2010): 402–26. doi:10.1177/1079063210372140.

The tactics category reported most frequently was sexual arousal, with 65% of all participants being subjected to at least one
expenence. Within this category, persistent kissing and touching was the most cited tactic (62% of all participants). Emotional manipulation
and deception was the next most frequently reported category, with 60% of participants being subjected to at least one experience. Within this category, participants
cited the specific tactics of repeated requests (54%) and telling lies (34No) most often. Intoxication was the third most frequently reported category, with 38% of all participants being subjected to at least one tactic. More participants reported being taken advantage of while already intoxicated (37%) than being purposely intoxicated (19%). The category with the lowest frequency of reports was physical force and harm, with 28% of participants being subjected to at least one tactic.Struckman-Johnson, Cindy, David Struckman-Johnson, and Peter B. Anderson. “Tactics of Sexual Coercion: When Men and Women Won’t Take No for an Answer.” The Journal of Sex Research 40, no. 1 (February 1, 2003): 76–86.

HJH 2015-02-08: Bolded comment policy, to increase the chance of it being read.
HJH 2015-10-31: Added a few more papers, relating to sexual coercion and hostility.

My Little Takedown of Christina Hoff Sommers

[Guest blogger HJ Hornbeck, here! This originally started off as a reply to someones’ comment, but it’s been greatly expanded and stands on its own. A hat tip to Ophelia Benson is in order, too, for providing some of the raw material via her blog, as well as for giving me the platform.]

Who is Christina Hoff Sommers? Let’s start off with one of her former employers, the Independent Women’s Forum, where she once served on the board. Wikipedia offers this summary of them:

The Independent Women’s Forum (IWF) is a politically conservative American non-profit organization focused on policy issues of concern to women. The IWF was founded by activist Rosalie Silberman to promote a “conservative alternative to feminist tenets” following the controversial Supreme Court nomination of Clarence Thomas in 1992.

The group advocates “equity feminism,” a term first used by IWF author Christina Hoff Sommers to distinguish “traditional, classically liberal, humanistic feminism” from “gender feminism”, which she claims opposes gender roles as well as patriarchy. According to Sommers, the gender feminist view is “the prevailing ideology among contemporary feminist philosophers and leaders” and “thrives on the myth that American women are the oppressed ‘second sex.’” Sommers’ equity feminism has been described as anti-feminist by critics.

But if you know Sommers at all, you probably know of her through her connection to the American Enterprise Institute.

The American Enterprise Institute for Public Policy Research(AEI) is an extremely influential, pro-business, think tank founded in 1943 by Lewis H. Brown. It promotes the advancement of free enterprise capitalism and its people have served in influential governmental positions. It is the base for many neo-conservatives. […]

In February 2007, The Guardian (UK) reported that AEI was offering scientists and economists $10,000 each, “to undermine a major climate change report” from the United Nations Intergovernmental Panel on Climate Change (IPCC). AEI asked for “articles that emphasise the shortcomings” of the IPCC report, which “is widely regarded as the most comprehensive review yet of climate change science.” AEI visiting scholar Kenneth Green made the $10,000 offer “to scientists in Britain, the US and elsewhere,” in a letter describing the IPCC as “resistant to reasonable criticism and dissent.”

The Guardian reported further that AEI “has received more than $1.6m from ExxonMobil, and more than 20 of its staff have worked as consultants to the Bush administration. Lee Raymond, a former head of ExxonMobil, is the vice-chairman of AEI’s board of trustees,” added The Guardian.

They too are an active opponent of feminism.

According to an April Newsweek profile, much of AEI’s recent influence has to do with Arthur C. Brooks, … who has been its president since 2009. (A $20 million donation from a Roman Catholic founder of the Carlyle Group probably didn’t hurt, either.) “He’s the message man,” Pema Levy wrote of Brooks. “He may not be a pollster, but Republicans say he possesses a gift for making conservative policies sound appealing.” Newsweek focused on the ways Brooks is nudging conservatives toward less flagrantly uncompassionate policies on poverty. But, judging from these op-eds, the AEI is also employing the most sophisticated techniques to date in the much-discussed Republican “war on women.”

For starters, they’ve put a female face on it. AEI scholar and The War Against Boys author Christina Hoff Sommers has a new “vlog” series, “The Factual Feminist” (as opposed to us fantasy feminists), in which she seeks to invalidate feminist discourse. […]

there is something especially insidious about a woman and self-described feminist like Sommers providing anti-feminist talking points. Her claim that “feminist activists have convinced many young women that a foolish, drunken hookup was actually rape” sounds a lot more credible than, say, Todd Akin’s “legitimate rape” distinction, despite meaning essentially the same thing: What women call rape isn’t really that big a deal.

So far, all we see are anti-feminist far-right think tanks. Here’s one exception, though, Prager University:

Dennis Prager is a neoconservative radio host, professional tone troll, and conspiracy theorist who believes that the United States is a Christian nation, and that it’s under attack from “secular leftists” who control the media, universities, public education system, and other institutions. Despite being a fairly extreme conservative, to the point of being a weekly WND columnist, he does moderate on certain issues such as abortion and, to his credit, he does seem to know quite a bit about religion and aspects of United States history. […]

He has also started his own non-profit online program called Prager University which, keeping up with his paranoia around universities turning students into secular bisexual leftists, has the totally not bizarre motto “Undoing the damage of the University… five minutes at a time.” It actually presents history and politics from a hard-right point of view, which includes rampant New Deal denialism, promotion of the Laffer curve, Europhobia, and an off the walls weird interpretation of liberalism.

Her contributions have consisted of a series of videos openly hostile to feminism, such as:

Women in America are the freest in the world, yet many feminists tell us women are oppressed. They advocate this falsehood through victim mentality propaganda and misleading statistics, such as the gender wage gap myth. In five minutes, American Enterprise Institute’s Christina Hoff Sommers tells you the truth about feminism.

So who is Christina Hoff Sommers? While she may bill herself as the “Factual Feminist”, her history suggests she’s a right-wing shill who uses her platform to spread misinformation about feminism, in the hope of opposing social change. I think she’s taking something of an embrace, extend, and extinguish approach: pretend to join up with what you oppose, but alter it to be superficially similar yet quite different and use a mix of money and rhetoric to bury the original version.

Yeah, the above’s a bit of an ad hominem, but I can fix that easily enough by looking at Sommer’s actual arguments. Take her recent video defending GamerGate.

You read that correctly, she’s defending GamerGate:

Well, take it from “Based Mom:” GamerGate overall is a voice for moderation in today’s fevered debates over sex and gender.

“Based Mom” is the nickname GamerGaters have bestowed on Sommers, incidentally. She shows up frequently as a target of affection, earning a place in their fan art, and is considered a leader. But what exactly is GamerGate? Sommers offers this summary:

#GamerGate is a Twitter hashtag, and it attracts gamers from all over the world, males and females, liberals, conservatives, black, white, straight, gay, trans… Some gamers identify with the hashtag because they believe there is too much corruption in gaming. Others are weary of cultural critics who evaluate video games through the prism of gender politics.

That narrative leaves out critical details, though. We have chat logs that show it’s also a coordinated movement plotting to spread hate and lies about women who talk about gender issues in games, with the help of an ex-boyfriend of one of their targets. In one such log, for instance, one member discusses driving Zoe Quinn to suicide, to general agreement, while another frets about keeping up the facade:

Opfag: I’m debating whether or not we should just attack zoe
Opfag: turn her into a victim
Opfag: let her cry and take it further
NASA_Agent: she’s already a victim
OtherGentleman: She’s a professional victim
NASA_Agent: it was real in her mind
ebola-chan: She’s victimizing herself.
Opfag: push her… push her further….. further, until eventually she an heroes
Silver|2: She’s a professional victim. She doesn’t do it for free
OtherGentleman: She can’t even into depression. What makes you think she has the balls to kill herself?
Opfag: I kind of want to just make her life irrepairably horrible
Opfag: At this point.
rd0951: ^^
rd0951: like i siad
NASA_Agent: but what if she suicides
Opfag: Good.
Opfag: Then we get to troll #Rememberzoe
NASA_Agent: #disarmcyberbullies2014
Opfag: And milk the lulcow corpse
OtherGentleman: The more you try to attack her directly, the more she gets to play the victim card and make a bunch of friends who will support her because, since she has a vagina, any attack is misgony
rd0951: ./v should be in charge of the gaming journalism aspect of it. /pol should be in charge of the feminism aspect, and /b should be in charge of harassing her into killing herself
Opfag: I agree.
BurntKimchi: #banassultburgersandfries
NASA_Agent: you don’t see this kind of unity often
Opfag: You don’t
Opfag: We really must be at war
Silver|2: It’s happening

We also have the posts where they come up with journalistic integrity as a cover for their bigotry:

This is a fun interesting story. I’ve been keeping track since the beginning but I think the lot of us are too scattered about what this should really be about. It shouldn’t be about a psycho slut who fucked 5 guys and hurt some betas feelings. I think the focus should be more on that this chick is using sex to climb her way through the ranks of the gaming industry, all while spewing an ideology that she does not believe nor follow.

We need to focus on the fact that she:
>Fucked journalists/game reviewers in order to give the game she designed, positive feedback.
>She fucked her current boss who is married. This is obviously bad, neither her or her boss should be allowed to keep their current job.
>She is a hypocrite that claims a very specific “feminist gamer” ideology and then 180s and has sex with everyone to get what she wants.

We need to expose her as a hypocrite and a liar. The cheating part is just a bonus, yes she’s a slut but there are tons of sluts out there. There is actually proof that she is getting leverage in her career by using sex and that is a travesty and a corruption.

Dude exactly yes.Thus far all that’s happening EVERYWHERE is we’re getting our threads deleted-from giantbomb, from reddit, and from 4chan itself.

What we need to do is bring a true discussion to the table. We need to ignore the dirty laundry between the beta and his slut girlfriend and bring to the table the discussion of “how close is too close when dealing with gaming press and game developers relationships as well as the relationships female game devs have with their superiors”.

Further proof comes from examining what happens on the #Gamergate hashtag, where the majority of discussion is not about ethics at all. We even have archives of where GamerGaters invented a hashtag as a false front, hoping to enlist innocent but gullible people to divide and conquer feminists:

Anonymous Wed 03 Sep 2014 03:56:48 No.261346918
WHO /MINORITY/ HERE? I’m like 2/3 of the things these faggots say they are fighting for, and when I engage them on Twitter (WITH MY FUCKING PERSONAL ACCOUNT) they ignore me. Jesus Christ this is getting frustrating, I might as well be a white male for these faggots.

Anonymous Wed 03 Sep 2014 03:57:44 No.261347051
You fuckers need to organize with your own hashtag and take a stand

Anonymous Wed 03 Sep 2014 03:59:14 No.261347271

>>261347051
>>261346918
Something like
>#NotYourShield
And demand the SJWs stop using you as a shield to deflect genuine criticism

Anonymous Wed 03 Sep 2014 04:31:01 No.261349447
>>261346918
>>261347271
#GamerGate + #NotYourShield is an excellent combination. Use it for talking about how you’re for GamerGate but nobody will admit you’re not white, cis and straight.

SPREAD IT

anonDorf: #notyourshield backup squad reporting in
Albel: mah nigga
Albel: retweet the hell out of that shit
Guest55872: I am non-cis, non-white, non-male
AnimeJustice: Can I use #notyourshield regardless?
Guest55872: Albel, you need my selfie?
Albel: Nah, I’m good bud.
Guest55872: Albel, asking, ’cause I do not tweet
foTTS: Use #gamergate and #notyourshield at the same time, pls Albel anonDorf AnimeJustice
Guest55872: anonDorf, want mines?
Albel: FoTTS: Sadly, I don’t fall under any of the #notyourshield categories but I’ll put it in there where I can
foTTS: spread the word about notyourshield Albel
anonDorf: Yeah why not
Guest55872: NICe

codeswish: yea, femfreq is easy PR, you forget that sending her a nice tweet gets them lots of retweets from her followers
Albel: codeswish: That’s fine. You know, maybe part of #gamergate is that we should not demonize femfreq
Albel: “Hey, I don’t necessarily with @femfreq but we here in #gamergate don’t condone the harassment.”
codeswish: The Sarkeesian Effect will handle it for us
W334800: Anita and Zoe are passive aggressive competing or victim-queen
AAAAaaaaAAAA: someone needs to set those 2 attention whores against each other
randompleb: that’s a brilliant idea
Guest55872: ^^
randompleb: two black holes eating each other
AAAAaaaaAAAA: find a way to make the ZQ followers hostile towards the AS followers

This coordinated assault has had real consequences:

The next day, my Twitter mentions were full of death threats so severe I had to flee my home. They have targeted the financial assets of my company by hacking. They have tried to impersonate me on Twitter. Even as we speak, they are spreading lies to journalists via burner e-mail accounts in an attempt to destroy me professionally.

We’ve lost too many women to this lunatic mob. Good women the industry was lucky to have, such as Jenn Frank, Mattie Bryce and my friend Samantha Allen, one of the most insightful critics in games media. They decided the personal cost was too high, and I don’t know who could blame them.

Every woman I know in the industry is terrified she will be next.

GamerGate, in short, is a hate group. While there may be positive elements to it, we have good reason to expect they will or are being exploited by the negative ones.

Which returns us to Christina Hoff Sommers:

Now, I discovered GamerGate when I was working on my recent video about sexism in games. Now in that video, I pointed out that the evidence does not support the claim that video games cause violence or misogyny. I mean gaming has surged since the early 1990’s, but youth crime has plummeted. And Millennials who were born and raised in “video game nation,” they are far less sexist, homophobic, bigoted than older generations.

Note the bait and switch? Sommers swiftly transitions from discussing sexism, to discussing violence, racism, and homophobia. She jumps from talking about video games to talking about youth crime, as if the greatest predictor of the latter was the former. It’s not.[1] If her case was solidly in line with the facts, she would never have to engage in such verbal slight-of-hand; Sommers would just duly report the facts, pointing on existing body of research that demonstrates an accurate, balanced portrayal of women in video games.

She doesn’t, because she can’t. In 2007, a group of researchers looked at video game cover art.[2] Why not the games themselves? Because different people have different skill levels, for different genres, and it’s difficult to capture the entire range in a statistical sample. Plus,

the covers are available for anyone to see, whether they are experienced or not; the covers are easily viewed by those not even interested in playing. For example, video games are usually just one aisle away from the movies in a rental store. Games are not organized by rating so the games rated for mature audiences are often display together with games meant for younger children. There is nothing keeping young children from being exposed to the images on the M-rated games even if they are only seeking an E-rated game. Lastly, for many people the decision to purchase, play, or allow a child to play a game maybe based largely on the material portrayed on the cover.

They found that men were portrayed three times as often, and that while men appear on 9 of 10 game covers, women only appeared on 4 of 10. Men were five times more likely than women to have a primary role on the cover, and four times more likely to have a secondary role. That’s not a typo; since there were four times many men on the covers, they dominated almost every stat. The main exception was objectification: 2 in 10 woman in a primary role were sexually objectified, while not a single man was, for instance. I recommend reading the full study, as I’ve just skimmed off a fraction of the details.

This isn’t an isolated finding, either,[3] [4] [5] [6] yet Sommers is completely ignorant of the research around gaming. She’s also outright lying:

in the earlier video I pointed out that gamers were being blamed for issuing death threats, even though no-one knows who sent them

This is not true.

[Brianna] Wu, who has written about the harassment against women in gaming, has long been critical of the recently-formed Gamergate movement and what she and others have seen as the targeting of women in the industry. Earlier this week she caught the attention of users of the pro-Gamergate message board 8chan after Tweeting snark about the movement, only to then see users of that board mock her, post details about her husband and ultimately publish her personal information (a screencap of a post with redacted info remained on the thread on Saturday).

“I was literally watching 8chan go after me in their specific chatroom for Gamergate,” she told Kotaku today. “They posted my address, and within moments I got that death threat.”

The only people circulating the home addresses of Anita Sarkeesian, Brianna Wu, or Zoe Quinn are from GamerGate. Whoever used the home address of those women to drive them out must have been, at minimum, assisted by GamerGate, which itself is a crime. Nor is Brianna Wu an exception, as Anita Sarkeesian demonstrates:

Multiple specific threats made stating intent to kill me & feminists at USU. For the record one threat did claim affiliation with #gamergate

Note too that Sommers thinks it’s unlikely that someone from a movement known for spreading lies and vicious rhetoric about Sarkeesian could have issued a death threat. She must think a death threat is equally as likely to come from the florist down the street, blissfully unaware of video games, or a supreme court judge, or a five old who can’t pick up a controller. To think that everyone’s equally likely to issue a death threat is a remarkably pessimistic view of humanity. But back to Sommers:

Colin Campbell, the senior reporter at Polygon for example, called me a “reactionary” and he suggested that my indifference to sexism in videos was a “irresponsible abrogation of our shared humanity.” I don’t doubt Mr. Campbell’s sincerity: many games do depict horrific violence, and mistreatment of women.

It’s fascinating how she reduces Campbell to a string of insults. His critique had far more substance than that:

Everything Sommers says comes from the assumption, asserted early in her video, that hardcore games are consumed by men because they are made for men, as if they were in the same category as aftershave and Men’s Health magazine.

But although male domination has been the status quo for many years, the influx of women playing games is a sign that women like to play games. “Hardcore games,” of the kind that women don’t play so much as casual games, are not marketed to address a particularly male need any more than blockbuster movies are; they are male-centric because their makers have failed to figure out how to make them more interesting to women.

All entertainment features subsets of products that are clearly aimed at men or women. Just take a look at the bookshelf in your local supermarket. But games have fallen into this male-centric locus because their makers have not been smart enough to reach outside their historic core target.

But given the choice, Sommers would rather focus on surface gloss instead of substance. This should be a red-flag to skeptics that her arguments are weaker than they first appear. Speaking of which:

But remember, there is vastly more violence and mistreatment of men!

This is misleading. It’s true that the cannon fodder tends to be male. But what’s under discussion isn’t raw body counts, it’s representation and erasure. Yes, men are frequently used for target practice, but it’s also true that men take the leading role, get long-running character arcs that fully flesh them out as human beings. Consider Nathan Drake of Uncharted, Marcus Fenix of Gears of War, Sora of Kingdom Hearts, or William Joseph Blaskowicz of Wolfenstein 3D. Women very rarely get starring roles, or for that matter show up at all. If all you had of the human race was our video games, you’d never guess that half our species was female.

So when women do show up, they’re the exotic “other.” They’re special, and rarely given time to develop their characters beyond the first dimension. Hence, even if women are more likely to be brutalized than men, in terms of raw numbers they’re a very small share of the body count. Sommers is ignoring one form of sexism in order to refute another!

the feminist critic Anita Sarkeesian disagrees. She has called the game “pernicious” and she faults its “shameless sexism,” and the use of the “male gaze.” “Everything about Bayonetta’s design,” says Sarkeesian, “is created specifically for the sexual pleasure of straight male gamers.” Those were her words. Now her critique relies on a 1975 feminist theory about the “male gaze” and how it objectifies and demeans women. But “gaze” theory has evolved since 1975! It turns out that spectators might be able to gaze at a women’s beauty and also identify with her at a human level.

It’s a stretch to call the “Male Gaze” a theory, as Laura Mulvey’s essay was intended to be more polemical than intellectually rigorous (and she invokes quite a bit of Freud).[7] Nonetheless, others realized she was on to something. Heterosexual men are sexually attracted to women, and tend to view men as rivals for that attraction. This translates into a distinctive “gaze” or viewpoint to narratives. The classic example of this is the introduction of Ursula Andress’ Honey Ryder in “Dr. No.” She emerges from the ocean partially clothed, as James Bond peeps from the bushes. The camera is reflecting a heterosexual male point of view, and catering to their preferences. There is such a thing as a “female gaze;” compare that scene to one in Casino Royale, where James Bond (Daniel Craig) emerges shirtless from the surf. For that moment, he is being sexualized. You can argue for other, non-heteronormative gazes, and some researchers have,[8] but those two types are the most common.

There’s nothing bad about a male or female gaze, per-se, the problem comes when one becomes dominant. Both men and women see movies, after all, so to appeal to everyone you’d expect movies to be primarily neutral but with moments of male- or female-gaze taking the fore about equally. Instead, the male gaze tends to dominate. This torques women’s view of themselves; a recent study[9] found that college women experienced more body shame and anxiety about their appearance when they were told they’d be interacting with a man, as opposed to interacting with a woman or no-one at all. Effect sizes were moderate, with one of the greatest having Cohen’s d = 0.59.

Objectification isn’t the same, but it’s frequently confused for it. James Bond is a subject; he can choose whether or not to act, and those choices affect the world around him. James Bond’s watch is an object; it does not act by itself, but subjects like James Bond can use it to perform actions. Honey Ryder is more object than subject: she’s there to help Bond defeat Dr. No, where “help” means both literally and metaphorically being pulled around by the arm, spouting worthless exposition, sitting out the final battle until Bond rescues her, then having hot sex with this near-stranger. By the next movie, she’s forgotten and replaced with a prettier model: Tatiana Romanova, played by Daniela Bianchi, who would be the second in a long line of interchangeable “Bond Girls.”

The confusion between the Male Gaze and objectification stems from the frequent collusion of the two. If men are rivals under the male gaze, then they tend to be involved in a struggle for power and control. This bleeds over into sexuality, resulting in women being reduced to conquests, trophies, and symbols of virility. Objectification is a natural consequence of the Male Gaze, but only because of the assumptions we absorb from our culture.

Summing up, Sommers is close enough to correct when she says “gaze theory” has evolved since the 1970’s, but her claim that women can be objects and subjects is at odds with the evidence. In the extreme, it’s logically impossible; how can you simultaneously have agency and lack it? That’s an embarrassing oversight for a philosopher.

But what about Sarkeesian’s claim that Bayonetta is designed to appeal to the straight male? Let’s consult a neutral source on the matter.

Bayonetta is portrayed as a tall, beautiful, young woman with a slender but curvy figure much like the other Umbra Witches in her clan. She has black hair wrapped into a beehive-like hairdo and gray eyes with a mole located at bottom of her left cheek close to her lips. Her main attire is composed of a skin-tight suit made out of her hair that has a rose design on the abdomen with long white gloves, black and gray heels, thin gold chains, three small belts strapped on each arm, and a pair of gold, cat-shaped earrings. […]

Because of her hair based fighting techniques, Bayonetta’s outfit becomes more revealing when she uses Wicked Weave techniques. Her suit’s inner section remains running up the middle and back of her body and her hair drapes over her chest to cover it, but the rest of the suit and the sleeves of hair vanish and trail outwards from her head in a spiral of hair and gold chain used to summon the demonic limbs. When summoning full demons, the entire suit disappears and leaves behind her gloves, shoes and watch.

Video game players are rewarded for successfully completing complex attacks with a strip tease by an attractive woman. It’s no surprise Sommers ignored Sarkeesian’s argument, because otherwise she would have been forced to agree.

If you’ve been following the news, you’ve probably seen alarming stories about an army of angry and vicious video gamers, marching under a banner called “GamerGate.” Well, according to these reports, this mob will stop at nothing to defend its “heteronormative privilege.”

Sommers says “heteronormative privilege” as if she’s quoting someone, and by connecting it to the news she makes sound like a common claim. But a simple news search reveals nothing. Expanding things to a normal web search, I can find a blog post by Cathy Smith, but she doesn’t apply the label to GamerGate at all.

It’s not surprising that many of the people who believe in GamerGate see cliques in game development and press. It’s possible these people have dealt with cliques in school, and I do believe that many of the people involved in this are still in school and feel like they’re at the bottom of the rung. Hell, I had no conception of sexism in middle school, and I had internalized a lot of misogyny that I hadn’t realized was a part of me until late high school. It’s hard to understand the concept of male privilege or white privilege or heteronormative privilege when you have to get permission to go to the damn bathroom.

I can find a Tumblr post about gay erasure in gaming, but it dates from before the name “GamerGate” was even coined.

And that’s it.

Where are these claims of “heteronormative privilege?” Sommers must have thought they were so prevalent that she didn’t need to cite them, yet that’s clearly not the case.

Today, at least in certain feminist circles, it’s open season on the sexual preferences of straight males.

It’s curious that someone who dubs themselves the “Factual Feminist” would make claims about feminism without evidence. This should have been a trivially easy citation for Sommers, yet she doesn’t bother. If history is any guide, that’s probably because she has none.

They need to show, not dogmatically assume, that video games make people sexist. The burden of proof rests with them.

By my count, I’ve provided at least five citations demonstrating that video games are sexist, and at least three show it has real-world consequences. Sommers, in contrast, has failed to provide a single one to support her view.

So, who is Christina Hoff Sommers? Possibly someone who quote-mines heated rhetoric from summaries and ignores substantive critique. Certainly, Sommers is a spokesperson for bigots, who’s made a career out of white-washing anti-feminist hate with a superficial gloss of pseudo-intellectualism. Her legacy will be one of promoting the suffering and misery of all genders in the world, presumably just to line her pockets with cold-hard cash.

Illuminati Lich (10:07 AM – 4 Nov 2014):
[JT Eberhard,] A video by Sommers?

JT Eberhard (10:07 AM – 4 Nov 2014):
[Illuminati Lich,] Yeah. I agree with most of what she said.

D.J. Grothe (2:31 PM – 1 Sep 2014):
[Sommers,] You’re a mythbuster in the grand tradition of those who debunk harmful nonsense, speaking truth to power in the public interest.

Richard Dawkins (12:27 AM – 17 Sep 2014):
The “Big Sister is Watching You” Thought Police hate [Sommers]’ Factual Feminism, and you can see why.

Fortunately, good skeptics are capable of looking past the false front and know not to take her claims seriously. I’m not the only one to spot this, by any means:

Laura Flanders. “The ‘Stolen Feminism’ Hoax”. Extra!, Sept. 1st, 1994.

Sharon Presley “Freedom Feminism Still Isn’t Either.” Reason.com. January 30, 2014

Malmsheimer, Taylor. “Conservatives Are Obsessed With Debunking the 1-in-5 Rape Statistic. They’re Wrong, Too.” The New Republic, June 27, 2014.

Ampersand. “Fact-Checking the Anti-Feminists; like Following around an Elephant with a Bucket, No Matter How Much Crap You Clean up They Keep Producing More.” Alas, a Blog. Accessed December 7, 2014.

Johnston, Angus. “Yes, Christina Hoff Sommers Is a Rape Denialist.” Accessed December 10, 2014.

Citations:

[1] Baron, Stephen W. “General Strain, Street Youth and Crime: A Test of Agnew’s Revised Theory.” Criminology 42, no. 2 (May 1, 2004): 457–84. doi:10.1111/j.1745-9125.2004.tb00526.x.

[2] Burgess, Melinda CR, Steven Paul Stermer, and Stephen R. Burgess. “Sex, lies, and video games: The portrayal of male and female characters on video game covers.” Sex Roles 57.5-6 (2007): 419-433.

[3] Dill, Karen E., and Kathryn P. Thill. “Video game characters and the socialization of gender roles: Young people’s perceptions mirror sexist media depictions.” Sex roles 57.11-12 (2007): 851-864.

[4] Behm-Morawitz, Elizabeth, and Dana Mastro. “The effects of the sexualization of female video game characters on gender stereotyping and female self-concept.” Sex roles 61.11-12 (2009): 808-823.

[5] Dietz, Tracy L. “An examination of violence and gender role portrayals in video games: Implications for gender socialization and aggressive behavior.” Sex roles 38.5-6 (1998): 425-442.

[6] Dill, Karen E., Brian P. Brown, and Michael A. Collins. “Effects of exposure to sex-stereotyped video game characters on tolerance of sexual harassment.” Journal of Experimental Social Psychology 44.5 (2008): 1402-1408.

[7] Mulvey, Laura. “Visual pleasure and narrative cinema.” Screen 16.3 (1975): 6-18.

[8] Wood, Mitchell J. “The Gay Male Gaze.” Journal of Gay & Lesbian Social Services 17, no. 2 (December 2, 2004): 43–62. doi:10.1300/J041v17n02_03.

[9] Calogero, Rachel M. “A Test Of Objectification Theory: The Effect Of The Male Gaze On Appearance Concerns In College Women.” Psychology of Women Quarterly 28, no. 1 (March 2004): 16–21. doi:10.1111/j.1471-6402.2004.00118.x.

HJH @ 2014/12/10: Added another link to someone critiquing Hoff Sommers.
HJH @ 2015/02/04: It’s “embrace, extend, extinguish.” Stupid dyslexia.