Rationality Rules is a Violent Transphobe

I thought I knew how this post would play out. EssenceOfThought has gotten some flack for declaring Stephen Woodford to be a “violent transphobe,” which I didn’t think they deserved. They gave a good defense in one of their videos, starting off with a definition of violence.

You see, violence is defined as the following by the World Health Organization. Quote; “the intentional use of physical force or power, threatened or actual, against oneself, another person, or against a group or community, that either results in, or has a high likelihood of resulting in injury, death, psychological harm, maldevelopment or deprivation.”

EoT points out that controlling someone’s behaviour or social networks by using their finances as leverage can be considered economic violence. They also point out that using legislation to control access to abortion can be considered legislative violence, as it deprives a person of their right to bodily autonomy. And thus, as EoT explains,

When you exclude trans women from women’s sports you’re not simply violating numerous human rights. You’re designating them as not real women, as an invasive force coming to take what doesn’t belong to them. You are cultivating future transphobic violence.

Note the air gap: “cultivating violence” and “violence” are not the same thing, and the definition EoT quoted above places intent front-and-centre. EoT bridges the gap by pointing out they gave Rationality Rules several months to demonstrate he promoted violent policies out of ignorance, rather than with intent. When “he [doubled] down on his violent transphobia,” EoT had sufficient evidence of intent to justify calling him a “violent transphobe.”

At this point I’d shore up their one citation with a few more. This decoupling of physical force and violence is not a new argument in the philosophy and social sciences literature.

Violence often involves physical force, and the association of force with violence is very close: in many contexts the words become synonyms. An obvious instance is the reference to a violent storm, a storm of great force. But in human affairs violence and force, cannot be equated. Force without violence is often used on a person’s body. If a person is in the throes of drowning, the standard Red Cross life-saving techniques specify force which is certainly not violence. To equate an act of rescue with an act of violence would be to lose sight entirely of the significance of the concept. Similarly, surgeons and dentists use force without doing violence.

Violence in human affairs is much more closely connected with the idea of violation than with the idea of force. What is fundamental about violence is that a person is violated. And if one immediately senses the truth of that statement, it must be because a person has certain rights which are undeniably, indissolubly, connected with being a person. One of these is a right to one’s body, to determine what one’s body does and what is done to one’s body — inalienable because without one’s body one would cease to be a person. Apart from a body, what is essential to one’s being a person is dignity. The real dignity of a person does not consist in remaining “dignified”, but rather in the ability to make decisions.

Garver, Newton. “What violence is.” The Nation 209.24 (1968): 819-822.

As a point of departure, let us say that violence is present when human beings are being influenced so that their actual somatic and mental realizations are below their potential realizations. […]

The first distinction to be made is between physical and psychological violence. The distinction is trite but important mainly because the narrow concept of violence mentioned above concentrates on physical violence only. […] It is useful to distinguish further between ’biological violence’, […] and ’physical violence as such’, which increases the constraint on human movements – as when a person is imprisoned or put in chains, but also when access to transportation is very unevenly distributed, keeping large segments of a population at the same place with mobility a monopoly of the selected few. But that distinction is less important than the basic distinction between violence that works on the body, and violence that works on the soul; where the latter would include lies, brainwashing, indoctrination of various kinds, threats, etc. that serve to decrease mental potentialities. […]

We shall refer to the type of violence where there is an actor that commits the violence as personal or direct, and to violence where there is no such actor as structural or indirect. In both cases individuals maybe killed or mutilated, hit or hurt in both senses of these words, and manipulated by means of stick or carrot strategies. But whereas in the first case these consequences can be traced back to concrete persons as actors, in the second case this is no longer meaningful. There may not be any person who directly harms another person in the structure. The violence is built into the structure and shows up as unequal power and consequently as unequal life chances.

Galtung, Johan. “Violence, peace, and peace research.” Journal of peace research 6.3 (1969): 167-191.

This expansive definition of “violence” has been influential, Galtung’s fifty-year-old paper from above has been cited from over 6,000 times according to Google Scholar. “Influential” is not a synonym for “consensus,” however.

Nearly all inquiries concerning the phenomenon of violence demonstrate that violence not only takes on many forms and possesses very different characteristics, but also that the current range of definitions is considerable and creates ample controversies concerning the question what violence is and how it ought to be defined (…). Since there are so many different kinds of violence (…) and since violence is studied from different actor perspectives (i.e. perpetrator, victim, third party, neutral observer), existing literature displays a wide variety of definitions based on different theoretical and, sometimes even incommensurable domain assumptions (e.g. about human nature, social order and history). In short, the concept of ‘violence’ is notoriously difficult to define because as a phenomenon it is multifaceted, socially constructed and highly ambivalent. […]

Violence is socially constructed because who and what is considered as violent varies according to specific socio-cultural and historical conditions. While legal scholars may require narrow definitions for punishable acts, the phenomenon of violence is invariably more complex in social reality. Not only do views about violence differ, but feelings regarding physical violence also change under the influence of social and cultural developments. The meanings that participants in a violent episode give to their own and other’s actions and experiences vary and can be crucial for deciding what is and what is not considered as violence since there is no simple relationship between the apparent severity of an attack and the impact that it has upon the victim. For example, in some cases, verbal aggression may prove to be more debilitating than physical attack.

De Haan, Willem. “Violence as an essentially contested concept.” Violence in Europe. Springer, New York, NY, 2008. 27-40.

A major objection to this inclusive definition of violence is that it makes everything violence, creating confusion instead of clarity. One example:

If violence is violating a person or a person’s rights, then every social wrong is a violent one, every crime against another a violent crime, every sin against one’s neighbor an act of violence. If violence is whatever violates a person and his rights of body, dignity, or autonomy, then lying to or about another, embezzling, locking one out of his house, insulting, and gossiping are all violent acts.

Betz, Joseph. “Violence: Garver’s definition and a Deweyan correction.” Ethics 87.4 (1977): 339-351.

The problem with this objection is that it assumes violence is binary: things are either violent, or they are not. Almost nothing in life falls in a binary, sex included, so a much more plausible model for violence is a continuum. I’m convinced that even the people who buy into a violence binary also accept that violence falls on a continuum, as I have yet to hear anyone argue that murder and wet willies are equally bad. Thus eliminating the binary and declaring all violence to fall on a continuum is a simpler theory, and by Occam’s razor should be favoured until contrary evidence comes along.

The other major objection is that while not every human society agrees on what constitutes violence, all of them agree that physical violence is violence. Sometimes this objection can be quite subtle:

Albeit rare, there are cases of violence occurring without rights being violated. This point has been made by Audi (1971, p. 59): ‘[while] in the most usual cases violence involves the violation of some moral right …there are also cases, like wrestling and boxing, in which even paradigmatic violence can occur without the violation of any moral right’.

Bufacchi, Vittorio. “Two concepts of violence.” Political Studies Review 3.2 (2005): 193-204.

That quote only works if you think wrestling is paradigmatic, something everyone agrees counts as violence. Wrestling fans would disagree, and either point to the hardcore training and co-operation involved or the efforts made to prevent injury, depending on which fandom you were querying. Societies definitely disagree on what physical acts count as violence, and even within a single country physical acts that are considered horrifically immoral to many today were perfectly acceptable to many a century ago. This pragmatic argument can also be turned on its head, by pointing out that if violence is binary then we wouldn’t expect a correlation between (for example) hostile views of women and violence towards women. If a violence continuum exists, however, such a correlation must exist.

Studies using Glick and Fiske’s (1996) Ambivalent Sexism Inventory, which contains different subscales for benevolent and hostile sexism, support this idea. Studies have found that greater endorsement of hostile sexism predicted more positive attitudes toward violence against a female partner (Forbes, Jobe, White, Bloesch, & Adams-Curtis, 2005; Sakalli, 2001). Other studies of IPV among college samples have found that men with more hostile sexist attitudes were more likely to have committed verbal aggression (Forbes et. al., 2004) and sexual coercion (Forbes & Adams-Curtis, 2001; Forbes et al., 2004).

Allen, Christopher T., Suzanne C. Swan, and Chitra Raghavan. “Gender symmetry, sexism, and intimate partner violence.” Journal of interpersonal violence 24.11 (2009): 1816-1834.

At this point in the post, though, I was supposed to pump the breaks a little. People have certain ideas in mind when you say “violence,” I’d say, and would likely equivocate between physical and non-physical violence. This would poison the well. Of course you can’t change language or create awareness by sitting on your hands, so EssenceOfThought were 100% in the right in arguing Rationality Rules was a violent transphobe, but at the same time I wasn’t willing to join in. I needed more time to think about it. After finishing that paragraph, I’d title this post “Rationality Rules is a ‘Violent’ Transphobe” and punch the Publish button.

But now that I’ve finished gathering my sources and writing this post, I have had time to think about it. I cannot find a good reason to reject the violence-as-intentional-rights-violation definition, in particular I cannot come up with a superior alternative. Rationality Rules argues that the rights of some transgender people should be restricted, via special pleading. As I point out at that link, Stephen Woodford is aware of the argument from human rights, so he cannot claim his restriction is being done out of ignorance. That gives us proof of intent.

So no quote marks are necessary: I too believe Rationality Rules is a violent transphobe, for the definitions and reasons above.

Rationality Rules is an Abusive Transphobe

Abuse comes in more forms than many people realize. Take financial abuse, where someone uses economic leverage to control you, or reproductive coercion, or this behaviour.

Gaslighting is a form of emotional abuse where the abuser intentionally manipulates the physical environment or mental state of the abusee, and then deflects responsibility by provoking the abusee to think that the changes reside in their imagination, thus constituting a weakened perception of reality (Akhtar, 2009; Barton & Whitehead, 1969; Dorpat, 1996; Smith & Sinanan, 1972). By repeatedly and convincingly offering explanations that depict the victim as unstable, the abuser can control the victim’s perception of reality while maintaining a position of truth-holder and authority.

Roberts, Tuesda, and Dorinda J. Carter Andrews. “A Critical Race Analysis of the Gaslighting against African American Teachers.” Contesting the Myth of a” Post Racial Era”: The Continued Significance of Race in US Education, 2013, 69–94.

A small but growing amount of the scientific literature considers gaslighting a form of abuse. It’s also worth knowing about a close cousin of gaslighting known as “DARVO.”

DARVO refers to a reaction perpetrators of wrong doing, particularly sexual offenders, may display in response to being held accountable for their behavior. DARVO stands for “Deny, Attack, and Reverse Victim and Offender.” The perpetrator or offender may Deny the behavior, Attack the individual doing the confronting, and Reverse the roles of Victim and Offender such that the perpetrator assumes the victim role and turns the true victim — or the whistle blower — into an alleged offender. This occurs, for instance, when an actually guilty perpetrator assumes the role of “falsely accused” and attacks the accuser’s credibility and blames the accuser of being the perpetrator of a false accusation. […]

In a 2017 peer-reviewed open-access research study, Perpetrator Responses to Victim Confrontation: DARVO and Victim Self-Blame, Harsey, Zurbriggen, & Freyd reported that: “(1) DARVO was commonly used by individuals who were confronted; (2) women were more likely to be exposed to DARVO than men during confrontations; (3) the three components of DARVO were positively correlated, supporting the theoretical construction of DARVO; and (4) higher levels of exposure to DARVO during a confrontation were associated with increased perceptions of self-blame among the confronters. These results provide evidence for the existence of DARVO as a perpetrator strategy and establish a relationship between DARVO exposure and feelings of self-blame.

If DARVO seems vaguely familiar, that’s because it’s a popular tactic in the far-Right. Brett Kavanaugh used it during his Congressional hearing, this YouTuber encountered it quite a bit among the Proud Boys, and even RationalWiki’s explanation of it invokes the Christian far-Right. DARVO may be common among sexual abusers, but it’s important to stress that it’s not exclusive to them. It’s best to think of this solely as an abusive tactic to evade scrutiny, without that extra baggage. [Read more…]

Rationality Rules DESTROYS Women’s Sport!!1!

I still can’t believe this post exists, given its humble beginnings.

The “women’s category” is, in my opinion, poorly named given our current climate, and so I’d elect a name more along the lines of the “Under 5 nmol/l category” (as in, under 5 nanomoles of testosterone per litre), but make no mistake about it, the “woman’s category” is not based on gender or identity, or even genitalia or chromosomes… it’s based on hormone levels and the absence of male puberty.

The above comment wasn’t in Rationality Rules’ latest transphobic video, it was just a casual aside by RR himself in the YouTube comment section. He’s obiquely doubled-down via Twitter (hat tip to Essence of Thought):

Of course, just as I support trans men competing in all “men’s categories” (poorly named), women who have not experienced male puberty competing in all women’s sport (also poorly named) and trans women who have experienced male puberty competing in long-distance running.

To further clarify, I think that we must rename our categories according to what they’re actually based on. It’s not right to have a “women’s category” and yet say to some trans women (who are women!) that they can’t compete within it; it should be renamed.

The proposal itched away at me, though, because I knew it was testable.

There is a need to clarify hormone profiles that may be expected to occur after competition when antidoping tests are usually made. In this study, we report on the hormonal profile of 693 elite athletes, sampled within 2 h of a national or international competitive event. These elite athletes are a subset of the cross-sectional study that was a component of the GH-2000 research project aimed at developing a test to detect abuse with growth hormone.

Healy, Marie-Louise, et al. “Endocrine profiles in 693 elite athletes in the postcompetition setting.” Clinical endocrinology 81.2 (2014): 294-305.

The GH-2000 project had already done the hard work of collecting and analyzing blood samples from athletes, so checking RR’s proposal was no tougher than running some numbers. There’s all sorts of ethical guidelines around sharing medical info, but fortunately there’s an easy shortcut: ask one of the scientists involved to run the numbers for me, and report back the results. Aggregate data is much more resistant to de-anonymization, so the ethical concerns are greatly reduced. The catch, of course, is that I’d have to find a friendly researcher with access to that dataset. About a month ago, I fired off some emails and hoped for the best.

I wound up much, much better than the best. I got full access to the dataset!! You don’t get handed an incredible gift like this and merely use it for a blog post. In my spare time, I’m flexing my Bayesian muscles to do a re-analysis of the above paper, while also looking for observations the original authors may have missed. Alas, that means my slow posting schedule is about to crawl.

But in the meantime, we have a question to answer.

What Do We Have Here? ¶

(Click here to show the code)

import numpy as np
import pandas as pd

dataset = pd.read_csv('dataset.minimal.tsv',sep='\t')

mask_afab = dataset['Gender']==2
mask_amab = dataset['Gender']==1

print( "{:24} = {}".format("Total Assigned-female Athletes", np.sum(mask_afab)) )
print( "{:24} = {:.2f} cm".format("  Height, Mean", np.mean( dataset['height'][mask_afab] )) )
print( "{:24} = {:.2f} cm".format("  Height, Std.Dev", np.std( dataset['height'][mask_afab] )) )
print( "{:24} = {:.2f} kg".format("  Weight, Mean", np.mean( dataset['weight'][mask_afab] )) )
print( "{:24} = {:.2f} kg".format("  Weight, Std.Dev", np.std( dataset['weight'][mask_afab] )) )
print( "{:24} = {:.2f} kg".format("  Body Fat, Mean", np.mean( (dataset['weight']*dataset['body-fat']*.01)[mask_afab] )) )
print( "{:24} = {:.2f} kg".format("  Body Fat, Std.Dev", np.std( (dataset['weight']*dataset['body-fat']*.01)[mask_afab] )) )

print( "{:24} = {:.2f} nmol/L".format("  Testosterone, Mean", np.mean( dataset['Testo'][mask_afab] )) )
print( "{:24} = {:.2f} nmol/L".format("  Testosterone, Std.Dev", np.std( dataset['Testo'][mask_afab] )) )
print( "{:24} = {:.2f} nmol/L".format("  Testosterone, Max", np.max( dataset['Testo'][mask_afab] )) )
print( "{:24} = {:.2f} nmol/L".format("  Testosterone, Min", np.min( dataset['Testo'][mask_afab] )) )
print()

print( "{:24} = {}".format("Total Assigned-male Athletes", np.sum(mask_amab) ) )
print( "{:24} = {:.2f} cm".format("  Height, Mean", np.mean( dataset['height'][mask_amab] )) )
print( "{:24} = {:.2f} cm".format("  Height, Std.Dev", np.std( dataset['height'][mask_amab] )) )
print( "{:24} = {:.2f} kg".format("  Weight, Mean", np.mean( dataset['weight'][mask_amab] )) )
print( "{:24} = {:.2f} kg".format("  Weight, Std.Dev", np.std( dataset['weight'][mask_amab] )) )
print( "{:24} = {:.2f} kg".format("  Body Fat, Mean", np.mean( (dataset['weight']*dataset['body-fat']*.01)[mask_amab] )) )
print( "{:24} = {:.2f} kg".format("  Body Fat, Std.Dev", np.std( (dataset['weight']*dataset['body-fat']*.01)[mask_amab] )) )

print( "{:24} = {:.2f} nmol/L".format("  Testosterone, Mean", np.mean( dataset['Testo'][mask_amab] )) )
print( "{:24} = {:.2f} nmol/L".format("  Testosterone, Std.Dev", np.std( dataset['Testo'][mask_amab] )) )
print( "{:24} = {:.2f} nmol/L".format("  Testosterone, Max", np.max( dataset['Testo'][mask_amab] )) )
print( "{:24} = {:.2f} nmol/L".format("  Testosterone, Min", np.min( dataset['Testo'][mask_amab] )) )

Total Assigned-female Athletes = 239
  Height, Mean           = 171.61 cm
  Height, Std.Dev        = 7.12 cm
  Weight, Mean           = 64.27 kg
  Weight, Std.Dev        = 9.12 kg
  Body Fat, Mean         = 13.19 kg
  Body Fat, Std.Dev      = 3.85 kg
  Testosterone, Mean     = 2.68 nmol/L
  Testosterone, Std.Dev  = 4.33 nmol/L
  Testosterone, Max      = 31.90 nmol/L
  Testosterone, Min      = 0.00 nmol/L

Total Assigned-male Athletes = 454
  Height, Mean           = 182.72 cm
  Height, Std.Dev        = 8.48 cm
  Weight, Mean           = 80.65 kg
  Weight, Std.Dev        = 12.62 kg
  Body Fat, Mean         = 8.89 kg
  Body Fat, Std.Dev      = 7.20 kg
  Testosterone, Mean     = 14.59 nmol/L
  Testosterone, Std.Dev  = 6.66 nmol/L
  Testosterone, Max      = 41.00 nmol/L
  Testosterone, Min      = 0.80 nmol/L

The first step is to get a basic grasp on what’s there, via some crude descriptive statistics. It’s also useful to compare these with the original paper, to make sure I’m interpreting the data correctly. Excusing some minor differences in rounding, the above numbers match the paper.

The only thing that stands out from the above, to me, is the serum levels of testosterone. At least one source says the mean of these assigned-female athletes is higher than the normal range for their non-athletic cohorts. Part of that may simply be because we don’t have a good idea of what the normal range is, so it’s not uncommon for each lab to have their own definition of “normal.” This is even worse for those assigned female, since their testosterone levels are poorly studied; note that my previous link collected the data of over a million “men,” but doesn’t mention “women” once. Factor in inaccurate test results and other complicating factors, and “normal” is quite poorly-defined.

Still, Rationality Rules is either convinced those complications are irrelevant, or ignorant of them. And, to be fair, that 5nmol/L line implicitly sweeps a lot of them under the rug. Let’s carry on, then, and look for invalid data. “Invalid” covers everything from missing data, to impossible data, and maybe even data we think might be made inaccurate due to measurement error. I consider a concentration of zero testosterone as invalid, even though it may technically be possible.

(Click here to show the code)

t_number = dataset['Testo'] >= 0
t_max = np.max(dataset['Testo'][t_number])
t_min = np.min(dataset['Testo'][t_number])
t_valid = (dataset['Testo'] > 0.5) & np.isfinite(dataset['Testo'])

print( "{:52} = {}".format("Total Assigned-male Athletes w/ T levels >= 0", np.sum(mask_amab & t_number) ) )
print( "{:52} = {}".format("                             w/ T levels <= 0.5", np.sum(mask_amab & (dataset['Testo']<=0.5)) ) )
print( "{:52} = {}".format("                             w/ T levels == 0", np.sum(mask_amab & (dataset['Testo']==0)) ) )
print( "{:52} = {}".format("                             w/ missing T levels", np.sum(mask_amab & np.isnan(dataset['Testo'])) ) )
print( "{:52} = {}".format("                             that I consider valid", np.sum(mask_amab & t_valid)) )

print()

print( "{:52} = {}".format("Total Assigned-female Athletes w/ T levels >= 0", np.sum(mask_afab & t_number)) )
print( "{:52} = {}".format("                               w/ T levels <= 0.5", np.sum(mask_afab & (dataset['Testo']<=0.5)) ) )
print( "{:52} = {}".format("                               w/ T levels == 0", np.sum(mask_afab & (dataset['Testo']==0)) ) )
print( "{:52} = {}".format("                               w/ missing T levels", np.sum(mask_afab & np.isnan(dataset['Testo'])) ) )
print( "{:52} = {}".format("                               that I consider valid", np.sum(mask_afab & t_valid)) )

Total Assigned-male Athletes w/ T levels >= 0        = 446
                             w/ T levels <= 0.5      = 0
                             w/ T levels == 0        = 0
                             w/ missing T levels     = 8
                             that I consider valid   = 446

Total Assigned-female Athletes w/ T levels >= 0      = 234
                               w/ T levels <= 0.5    = 5
                               w/ T levels == 0      = 1
                               w/ missing T levels   = 5
                               that I consider valid = 229

Fortunately for us, the losses are pretty small. 229 datapoints is a healthy sample size, so we can afford to be liberal about what we toss out. Next up, it would be handy to see the data in chart form.

(Click here to show the code)

# %matplotlib notebook  # makes the plots interactive, but only one can be active
%matplotlib inline 

import matplotlib.pyplot as pp
pp.rcParams['figure.dpi'] = 96      # MUST SET THIS FIRST
pp.rcParams['figure.figsize'] = [9.5, 6]

bins = 9
pp.hist( np.log(dataset['Testo'][mask_afab & t_valid]), bins, density=1, facecolor='black', alpha=0.2)
pp.hist( np.log(dataset['Testo'][mask_amab & t_valid]), bins, density=1, facecolor='green', alpha=0.2)
pp.legend(['aFab','aMab'], loc=0)

pp.title('Testosterone, elite athletes')
pp.xlabel('nmol/L')
pp.xticks(np.linspace(-2,4,9), ["{:.1f}".format(np.exp(x)) for x in np.linspace(-2,4,9)])
pp.yticks([])

pp.axvline(np.log(0.5),0,1)
pp.axvline(np.log(5),0,1)
# Source: https://www.exeterlaboratory.com/test/testosterone/
pp.fill( np.log([29,8.6,8.6,29]), [1.2,1.2,0,0], facecolor='green', alpha=0.05 )
pp.fill( np.log([1.68,.3,.3,1.68]), [1.2,1.2,0,0], facecolor='black', alpha=0.05 )

pp.show()

I've put vertical lines at both the 0.5 and 5 nmol/L cutoffs. There's a big difference between categories, but we can see clouds on the horizon: a substantial number of assigned-female athletes have greater than 5 nmol/L of testosterone in their bloodstream, while a decent number of assigned-male athletes have less. How many?

(Click here to show the code)

mask_gt_5nmol = t_valid & (dataset['Testo'] > 5)
mask_lt_5nmol = t_valid & (dataset['Testo'] < 5)
mask_eq_5nmol = t_valid & (dataset['Testo'] == 5)

print("Segregating Athletes by Testosterone")

table = {"Concentration":["> 5nmol/L","< 5nmol/L","= 5nmol/L"],
        "aFab":[sum(mask_gt_5nmol & mask_afab),sum(mask_lt_5nmol & mask_afab),sum(mask_eq_5nmol & mask_afab)],
        "aMab":[sum(mask_gt_5nmol & mask_amab), sum(mask_lt_5nmol & mask_amab), sum(mask_eq_5nmol & mask_amab)]}
print(pd.DataFrame(table).to_string(index=False))
print()

print("{:.1f}% of assigned-female athletes have > 5nmol/L".format(100.*sum(mask_gt_5nmol & mask_afab)/sum(t_valid & mask_afab)))
print("{:.1f}% of assigned-male athletes have < 5nmol/L".format(100.*sum(mask_lt_5nmol & mask_amab)/sum(t_valid & mask_amab)))
print("{:.1f}% of athletes with > 5nmol/L are assigned-female".format(100.*sum(mask_gt_5nmol & mask_afab)/sum(mask_gt_5nmol)))
print("{:.1f}% of athletes with < 5nmol/L are assigned-male".format(100.*sum(mask_lt_5nmol & mask_amab)/sum(mask_lt_5nmol)))

Segregating Athletes by Testosterone
Concentration  aFab  aMab
   > 5nmol/L    19   417
   < 5nmol/L   210    26
   = 5nmol/L     0     3

8.3% of assigned-female athletes have > 5nmol/L
5.8% of assigned-male athletes have < 5nmol/L
4.4% of athletes with > 5nmol/L are assigned-female
11.0% of athletes with < 5nmol/L are assigned-male

Looks like anywhere from 6-8% of athletes have testosterone levels that cross Rationality Rules' line. For comparison, maybe 1-2% of the general public has some level of gender dysphoria, though estimating exact figures is hard in the face of widespread discrimination and poor sex-ed in schools. Even that number is misleading, as the number of transgender athletes is substantially lower than 1-2% of the athletic population. The share of transgender athletes is irrelevant to this dataset anyway, as it was collected between 1996 and 1999, when no sporting agency had policies that allowed transgender athletes to openly compete.

That 6-8%, in other words, is entirely cisgender. This echoes one of Essence Of Thought's arguments: RR's 5nmol/L policy has far more impact on cis athletes than trans athletes, which could have catastrophic side-effects. Could is the operative word, though, because as of now we don't know anything about these athletes. Do >5nmol/L assigned-female athletes have bodies more like >5nmol/L assigned-male athletes than <5nmol/L assigned-female athletes? If so, then there's no problem. Equivalent body types are competing against each other, and outcomes are as fair as could be reasonably expected.

What, then, counts as an "equivalent" body type when it comes to sport?

Newton's First Law of Athletics ¶

One reasonable measure of equivalence is height. It's one of the stronger sex differences, and height is also correlated with longer limbs and greater leverage. Whether that's relevant to sports is debatable, but height and correlated attributes dominate Rationality Rules' list.

[19:07] In some events - such as long-distance running, in which hemoglobin and slow-twitch muscle fibers are vital - I think there's a strong argument to say no, [transgender women who transitioned after puberty] don't have an unfair advantage, as the primary attributes are sufficiently mitigated. But in most events, and especially those in which height, width, hip size, limb length, muscle mass, and muscle fiber type are the primary attributes - such as weightlifting, sprinting, hammer throw, javelin, netball, boxing, karate, basketball, rugby, judo, rowing, hockey, and many more - my answer is yes, most do have an unfair advantage.

Fortunately for both of us, most athletes in the dataset have a "valid" height, which I define as being at least 30cm tall.

(Click here to show the code)

height_valid = dataset['height'] > 30
print("Out of {:3} athletes, {} have valid height data.".format(len(height_valid), sum(height_valid)) )

bins = 9

pp.hist( dataset['height'][mask_afab & height_valid], bins, density=1, facecolor='black', alpha=0.2)
pp.hist( dataset['height'][mask_amab & height_valid], bins, density=1, facecolor='green', alpha=0.2)
pp.legend(['aFab','aMab'], loc=0)

pp.title('Height, elite athletes')
pp.xlim([145,215])
pp.xlabel("cm")
pp.yticks([])

# source: https://ourworldindata.org/human-height, Germany, 1976
pp.axvline(166.3,0,1, color='k', alpha=0.2)
pp.axvline(np.mean(dataset['height'][mask_afab & height_valid]),0,1, color='k')
pp.axvline(179.3,0,1, color='g', alpha=0.2)
pp.axvline(np.mean(dataset['height'][mask_amab & height_valid]),0,1, color='g')

pp.show()

Out of 693 athletes, 678 have valid height data.

The faint vertical lines are for the mean adult height of Germans born in 1976, which should be a reasonable cohort to European athletes that were active between 1996 and 1999, while the darker lines are each category's mean. Athletes seem slightly taller than the reference average, but only by 2-5cm. The amount of overlap is also surprising, given that height is supposed to be a major sex difference. We actually saw less overlap with testosterone! Finally, the height distribution isn't quite Gaussian, there's a subtle bias towards the taller end of the spectrum.

Height is a pretty crude metric, though. You could pair any athlete with a non-athlete of the same height, and there's no way the latter would perform as well as the former. A better measure of sporting ability would be muscle mass. We shouldn't use the absolute mass, though: bigger bodies have more mass and need more force to accelerate as smaller bodies do, so height and muscle mass are correlated. We need some sort of dimensionless scaling factor which compensates.

And we have one! It's called the Body Mass Index, or BMI.

$$ BMI = \frac w {h^2}, $$

where $w$ is a person's mass in kilograms, and $h$ is a person's height in metres. Unfortunately, BMI is quite problematic. Partly that's because it is a crude measure of obesity. But part of that is because there are two types of tissue which can greatly vary, body fat and muscle, yet both contribute equally towards BMI.

That's all fixable. For one, some of the athletes in this dataset had their body fat measured. We can subtract that mass off, so their weight consists of tissues that are strongly correlated with height plus one that is fudgable: muscle mass. For two, we're not assessing these individual's health, we only want a dimensionless measure of muscle mass relative to height. For three, we're not comparing these individuals to the general public, so we're not restricted to using the general BMI formula. We can use something more accurate.

The oddity is the appearance of that exponent 2, though our world is three-dimensional. You might think that the exponent should simply be 3, but that doesn't match the data at all. It has been known for a long time that people don't scale in a perfectly linear fashion as they grow. I propose that a better approximation to the actual sizes and shapes of healthy bodies might be given by an exponent of 2.5. So here is the formula I think is worth considering as an alternative to the standard BMI:

$$ BMI' = 1.3 \frac w {h^{2.5}} $$

I can easily pop body fat into Nick Trefethen's formula, and get a better measure of relative muscle mass,

$$ \overline{BMI} = 1.3 \frac{ w - bf }{h^{2.5}}, $$

where $bf$ is total body fat in kilograms. Individuals with excess muscle mass, relative to what we expect for their height, will have a high $\overline{BMI}$, and vice-versa. And as we saw earlier, muscle mass is another of Rationality Rules' determinants of sporting performance.

Time for more number crunching.

(Click here to show the code)

BMI_adj = 1.3*(dataset['weight']*(100. - dataset['body-fat']))*0.01/((dataset['height']*.01)**(2.5))
BMI_adj_valid = BMI_adj > 1

print( "Out of {:3} athletes, {} have valid adjusted BMIs.".format(len(BMI_adj_valid), sum(BMI_adj_valid)) )
print( "                     {} have valid weights.".format(sum(dataset['weight'] > 10)) )
print( "                     {} have valid body fat percentages.".format(sum(dataset['body-fat'] >= 0)) )
print()

print( "{:24} = {}".format("Total Assigned-female Athletes", np.sum(mask_afab)) )
print( "{:24} = {}".format(" total with valid adjusted BMI", np.sum(mask_afab & BMI_adj_valid)) )
print( "{:24} = {:.2f}".format("  adjusted BMI, Mean", np.mean( BMI_adj[mask_afab & BMI_adj_valid] )) )
print( "{:24} = {:.2f}".format("  adjusted BMI, Std.Dev", np.std( BMI_adj[mask_afab & BMI_adj_valid] )) )
print( "{:24} = {:.2f}".format("  adjusted BMI, Median", np.median( BMI_adj[mask_afab & BMI_adj_valid] )) )
print()

print( "{:24} = {}".format("Total Assigned-male Athletes", np.sum(mask_amab)) )
print( "{:24} = {}".format(" total with valid adjusted BMI", np.sum(mask_amab & BMI_adj_valid)) )
print( "{:24} = {:.2f}".format("  adjusted BMI, Mean", np.mean( BMI_adj[mask_amab & BMI_adj_valid] )) )
print( "{:24} = {:.2f}".format("  adjusted BMI, Std.Dev", np.std( BMI_adj[mask_amab & BMI_adj_valid] )) )
print( "{:24} = {:.2f}".format("  adjusted BMI, Median", np.median( BMI_adj[mask_amab & BMI_adj_valid] )) )

Out of 693 athletes, 227 have valid adjusted BMIs.
                     663 have valid weights.
                     241 have valid body fat percentages.

Total Assigned-female Athletes = 239
 total with valid adjusted BMI = 86
  adjusted BMI, Mean     = 16.98
  adjusted BMI, Std.Dev  = 1.21
  adjusted BMI, Median   = 16.96

Total Assigned-male Athletes = 454
 total with valid adjusted BMI = 141
  adjusted BMI, Mean     = 20.56
  adjusted BMI, Std.Dev  = 1.88
  adjusted BMI, Median   = 20.28

The bad news is that most of this dataset lacks any information on body fat, which really cuts into our sample size. The good news is that we've still got enough to carry on. It also looks like there's a strong sex difference, and the distribution is pretty clustered. Still, a chart would help clarify the latter point.

(Click here to show the code)

bins = 9

pp.hist( np.log(BMI_adj)[mask_afab & BMI_adj_valid], bins, density=1, facecolor='black', alpha=0.2)
pp.hist( np.log(BMI_adj)[mask_amab & BMI_adj_valid], bins, density=1, facecolor='green', alpha=0.2)
pp.legend(['aFab','aMab'], loc=0)

pp.title('Adjusted BMI, elite athletes')
pp.xticks(np.linspace(np.log(14),np.log(28),8), 
          ["{:.1f}".format(np.exp(x)) for x in np.linspace(np.log(14),np.log(28),8)])
pp.yticks([])

pp.axvline(np.log(np.mean( BMI_adj[mask_afab & BMI_adj_valid] )),0,1, color='k')
pp.axvline(np.log(np.mean( BMI_adj[mask_amab & BMI_adj_valid] )),0,1, color='g')

pp.show()

Whoops! There's more overlap and skew than I thought. Even in logspace, the results don't look Gaussian. We'll have to remember that for the next step.

A Man Without a Plan is Not a Man ¶

Just looking at charts isn't going to solve this question, we need to do some sort of hypothesis testing. Fortunately, all the pieces I need are here. We've got our hypothesis, for instance:

Athletes with exceptional testosterone levels are more like athletes of the same sex but with typical testosterone levels, than they are of other athletes with a different sex but similar testosterone levels.

If you know me, you know that I'm all about the Bayes, and that gives us our methodology.

Fit a model to a specific metric for assigned-female athletes with less than 5nmol/L of serum testosterone.
Fit a model to a specific metric for assigned-male athletes with more than 5nmol/L of serum testosterone.
Apply the first model to the test group, calculating the overall likelihood.
Apply the second model to the test group, calculating the overall likelihood.
Sample the probability distribution of the Bayes Factor.

"Metric" is one of height or $\overline{BMI}$, while "test group" is one of assigned-female athletes with >5nmol/L of serum testosterone or assigned-male athletes with <5nmol/L of serum testosterone. The Bayes Factor is simply

$$ \text{Bayes Factor} = \frac{ p(E \mid H_1) \cdot p(H_1) }{ p(E \mid H_2) \cdot p(H_2) } = \frac{ p(H_1 \mid E) }{ p(H_2 \mid E) }, $$

which means we need two hypotheses, not one. Fortunately, I've phrased the hypothesis to make it easy to negate: athletes with exceptional testosterone levels are less like athletes of the same sex but with typical testosterone levels, than they are of other athletes with a different sex but similar testosterone levels. We'll call this new hypothesis $H_2$, and the original $H_1$. Bayes factors greater than 1 mean $H_1$ is more likely than $H_2$, and vice-versa.

Calculating all that would be easy if I was using Stan or PyMC3, but I ran into problems translating the former's probability distributions into charts, and I don't have any experience with the latter. My next choice, emcee, forces me to manually convolve two posterior distributions. Annoying, but not difficult.

I'm a Model, If You Know What I Mean ¶

That just leaves one thing left: what models are we going to use? The obvious choice for height is the Gaussian distribution, as from previous research we know it's a great model.

(Click here to show the code)

import emcee
nwalkers, nsamples = 128, 6
models = dict()

import os
import scipy.stats as spst

ndim = 2
def lnLike_gaussian( theta, x ):
    mu, sigma = theta
    return np.sum( spst.norm( mu, sigma ).logpdf( x ) )

def lnPrior_gaussian( theta ):
    mu, sigma = theta
    if sigma <= 0:           # standard deviation must be positive
        return -np.inf
    return -2*np.log(sigma)  # favor lower standard deviations

def lnProb_gaussian( theta, x ):
    temp = lnPrior_gaussian( theta )
    if temp == -np.inf:
        return temp
    return temp + lnLike_gaussian( theta, x )

x = dataset['height'][mask_lt_5nmol & height_valid & mask_afab]
pos = [np.array([150,15]) + 1e-2*np.random.randn(ndim) for i in range(nwalkers)]  # rough estimate

print("Fitting the height of lT aFab athletes to a Gaussian distribution ...")


lnprob = None
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnProb_gaussian, threads=os.cpu_count(), args=[ x ] )
model_mean = np.mean( pos, axis=0 )
print("{:6}: ({:5f}) mu={:7f}, sigma={:7f}".format(0, lnProb_gaussian( model_mean, x ), *model_mean))
for it in range(nsamples):
    global pos, lnprob
    
    pos, lnprob, _ = sampler.run_mcmc( pos, 64, storechain=False )
    model_mean = np.mean( pos, axis=0 ) # fairer than going with the maximal likelihood
    print("{:6}: ({:5f}) mu={:7f}, sigma={:7f}".format((it+1) * 64, lnProb_gaussian( model_mean, x ), *model_mean))

best = np.argmax( lnprob )
print("{:>6}: ({:5f}) mu={:7f}, sigma={:7f}".format("ML", lnProb_gaussian( pos[best], x ), *pos[best]))
model_median = np.median( pos, axis=0 )
print("{:>6}: ({:5f}) mu={:7f}, sigma={:7f}".format("median", lnProb_gaussian( model_median, x ), *model_median))

models["lT_aFab_height"] = pos     # store the posterior for later use

Fitting the height of lT aFab athletes to a Gaussian distribution ...
     0: (-980.322471) mu=150.000819, sigma=15.000177
    64: (-710.417497) mu=169.639051, sigma=8.579088
   128: (-700.539260) mu=171.107358, sigma=7.138832
   192: (-700.535241) mu=171.154151, sigma=7.133279
   256: (-700.540692) mu=171.152701, sigma=7.145515
   320: (-700.552831) mu=171.139668, sigma=7.166857
   384: (-700.530969) mu=171.086422, sigma=7.094077
    ML: (-700.525284) mu=171.155240, sigma=7.085777
median: (-700.525487) mu=171.134614, sigma=7.070993

Alas, emcee also lacks a good way to assess model fitness. One crude metric is look at the progression of the mean fitness; if it grows and then stabilizes around a specific value, as it does here, we've converged on something. Another is to compare the mean, median, and maximal likelihood of the posterior; if they're about equally likely, we've got a fuzzy caterpillar. Again, that's also true here.

As we just saw, though, charts are a better judge of fitness than a handful of numbers.

(Click here to show the code)

def plotWrapper( func, theta, x ):
    retVal = list()
    for data in x:
        retVal.append( np.exp( func( theta, data ) ) )
    return retVal

minVal = min(dataset['height'][height_valid])
maxVal = max(dataset['height'][height_valid])
    
x = np.linspace(minVal,maxVal,255)


bins = 9

pp.hist( dataset['height'][mask_afab & height_valid], bins, density=1, facecolor='black', alpha=0.2)
pp.legend(['lT aFab'], loc=0)
pp.plot( x, plotWrapper(lnLike_gaussian, np.mean(models['lT_aFab_height'],axis=0), x ), 'k' )

pp.title('Height, elite athletes')
pp.xlim([145,215])
pp.xlabel("cm")
pp.yticks([])

# source: https://ourworldindata.org/human-height, Germany, 1976
pp.axvline(166.3,0,1, color='k', alpha=0.2)
pp.axvline(np.mean(dataset['height'][mask_afab & height_valid]),0,1, color='k')

pp.show()

If you were wondering why I didn't make much of a fuss out of the asymmetry in the height distribution, it's because I've already seen this graph. A good fit isn't necessarily the best though, and I might be able to get a closer match by incorporating the sport each athlete played.

(Click here to show the code)

sport_names = { 
    1: 'Power lifting',
    2: 'Basketball',
    3: 'Football',
    4: 'Swimming',
    5: 'Marathon',
    6: 'Canoeing',
    7: 'Rowing',
    8: 'Cross-country skiing',
    9: 'Alpine skiing',
    10: 'Weight lifting',
    11: 'Judo',
    12: 'Bandy',
    13: 'Ice Hockey',
    14: 'Handball',
    15: 'Track and field'}

print("{:^48}".format("Assigned-female Athletes"))
print("{:^24} {:^23}".format("sport","below/above 171cm"))
for sport in pd.Categorical( dataset['sport'] ).categories:
    below = sum(dataset['sport'][mask_afab & height_valid & (dataset['height'] < 171)] == sport) above = sum(dataset['sport'][mask_afab & height_valid & (dataset['height'] >= 171)] == sport)
    print("{:>24}: {:2} /{:2}".format(sport_names[sport], below, above))

            Assigned-female Athletes            
         sport              below/above 171cm   
           Power lifting:  1 / 0
              Basketball:  2 /12
                Football:  0 / 0
                Swimming: 41 /49
                Marathon:  0 / 1
                Canoeing:  1 / 0
                  Rowing:  9 /13
    Cross-country skiing:  8 / 1
           Alpine skiing: 11 / 1
          Weight lifting:  7 / 0
                    Judo:  0 / 0
                   Bandy:  0 / 0
              Ice Hockey:  0 / 0
                Handball: 12 /17
         Track and field: 22 /27

Basketball attracts tall people, unsurprisingly, while skiing seems to attract shorter people. This could be the cause of that asymmetry. It's no guarantee that I'll actually get a better fit, though, as I'm also dramatically cutting the number of datapoints to fit to. The model's uncertainty must increase as a result, and that may be enough to dilute out any increase in fitness. I'll run those numbers for the paper, but for now the Gaussian model I have is plenty good.

(Click here to show the code)

x = dataset['height'][mask_gt_5nmol & height_valid & mask_amab]
pos = [np.array([150,15]) + 1e-2*np.random.randn(ndim) for i in range(nwalkers)]  # rough estimate

print("Fitting the height of hT aMab athletes to a Gaussian distribution ...")


lnprob = None
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnProb_gaussian, threads=os.cpu_count(), args=[ x ] )
model_mean = np.mean( pos, axis=0 )
print("{:6}: ({:5f}) mu={:7f}, sigma={:7f}".format(0, lnProb_gaussian( model_mean, x ), *model_mean))
for it in range(nsamples):
    global pos, lnprob
    
    pos, lnprob, _ = sampler.run_mcmc( pos, 64, storechain=False )
    model_mean = np.mean( pos, axis=0 ) # fairer than going with the maximal likelihood
    print("{:6}: ({:5f}) mu={:7f}, sigma={:7f}".format((it+1) * 64, lnProb_gaussian( model_mean, x ), *model_mean))

best = np.argmax( lnprob )
print("{:>6}: ({:5f}) mu={:7f}, sigma={:7f}".format("ML", lnProb_gaussian( pos[best], x ), *pos[best]))
model_median = np.median( pos, axis=0 )
print("{:>6}: ({:5f}) mu={:7f}, sigma={:7f}".format("median", lnProb_gaussian( model_median, x ), *model_median))

models["hT_aMab_height"] = pos     # store the posterior for later use

Fitting the height of hT aMab athletes to a Gaussian distribution ...
     0: (-2503.079578) mu=150.000061, sigma=15.001179
    64: (-1482.315571) mu=179.740851, sigma=10.506003
   128: (-1451.789027) mu=182.615810, sigma=8.620333
   192: (-1451.748336) mu=182.587979, sigma=8.550535
   256: (-1451.759883) mu=182.676004, sigma=8.546410
   320: (-1451.746697) mu=182.626918, sigma=8.538055
   384: (-1451.747266) mu=182.580692, sigma=8.534070
    ML: (-1451.746074) mu=182.591047, sigma=8.534584
median: (-1451.759295) mu=182.603231, sigma=8.481894

We get the same results when fitting the model to >5 nmol/L assigned-male athletes. The log likelihood, that number in brackets, is a lot lower for these athletes, but that number is roughly proportional to the number of samples. If we had the same degree of model fitness but doubled the number of samples, we'd expect the log likelihood to double. And, sure enough, this dataset has roughly twice as many assigned-male athletes as it does assigned-female athletes.

(Click here to show the code)

x = np.linspace(minVal,maxVal,255)

bins = 9

pp.hist( dataset['height'][mask_lt_5nmol & mask_afab & height_valid], bins, density=1, facecolor='black', alpha=0.2)
pp.hist( dataset['height'][mask_gt_5nmol & mask_amab & height_valid], bins, density=1, facecolor='green', alpha=0.2)
pp.legend(['lT aFab','hT aMab'], loc=0)
pp.plot( x, plotWrapper(lnLike_gaussian, np.mean(models['lT_aFab_height'],axis=0), x ), 'k' )
pp.plot( x, plotWrapper(lnLike_gaussian, np.mean(models['hT_aMab_height'],axis=0), x ), 'g' )

pp.title('Height, elite athletes')
pp.xlim([145,215])
pp.xlabel("cm")
pp.yticks([])

# source: https://ourworldindata.org/human-height, Germany, 1976
pp.axvline(166.3,0,1, color='k', alpha=0.2)
pp.axvline(np.mean(dataset['height'][mask_lt_5nmol & mask_afab & height_valid]),0,1, color='k')
pp.axvline(179.3,0,1, color='g', alpha=0.2)
pp.axvline(np.mean(dataset['height'][mask_gt_5nmol & mask_amab & height_valid]),0,1, color='g')

pp.show()

The updated charts are more of the same.

Unfortunately, adjusted BMI isn't nearly as tidy. I don't have any prior knowledge that would favour a particular model, so I wound up testing five candidates: the Gaussian, Log-Gaussian, Gamma, Weibull, and Rayleigh distributions. All but the first needed an offset parameter to get the best results, which has the same interpretation as last time.

(Click here to show the code)

ndim = 2

x = BMI_adj[mask_gt_5nmol & BMI_adj_valid & mask_amab]
pos = [np.array([15,5]) + 1e-2*np.random.randn(ndim) for i in range(nwalkers)]  # rough estimate

print("Fitting the adjusted BMI of hT aMab athletes to a Gaussian distribution ...")


lnprob = None
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnProb_gaussian, threads=os.cpu_count(), args=[ x ] )
model_mean = np.mean( pos, axis=0 )
print("{:6}: ({:5f}) mu={:7f}, sigma={:7f}".format(0, lnProb_gaussian( model_mean, x ), *model_mean))
for it in range(nsamples):
    global pos, lnprob
    
    pos, lnprob, _ = sampler.run_mcmc( pos, 64, storechain=False )
    
model_mean = np.mean( pos, axis=0 ) # fairer than going with the maximal likelihood
print("{:6}: ({:5f}) mu={:7f}, sigma={:7f}".format((it+1) * 64, lnProb_gaussian( model_mean, x ), *model_mean))
best = np.argmax( lnprob )
print("{:>6}: ({:5f}) mu={:7f}, sigma={:7f}".format("ML", lnProb_gaussian( pos[best], x ), *pos[best]))
model_median = np.median( pos, axis=0 )
print("{:>6}: ({:5f}) mu={:7f}, sigma={:7f}".format("median", lnProb_gaussian( model_median, x ), *model_median))

models['hT_aMab_BMI_gaussian'] = pos

Fitting the adjusted BMI of hT aMab athletes to a Gaussian distribution ...
     0: (-410.901047) mu=14.999563, sigma=5.000388
   384: (-256.474147) mu=20.443497, sigma=1.783979
    ML: (-256.461460) mu=20.452817, sigma=1.771653
median: (-256.477475) mu=20.427138, sigma=1.781139

(Click here to show the code)

ndim = 3
def lnLike_loggaussian( theta, x ):
    mu, sigma, off = theta
    x_adj = x - off
    if np.any( x_adj < 0 ):
        return -np.inf
    return np.sum( -.5*( ((x_adj-mu)/sigma)**2 ) - np.log( x_adj*sigma ) )

def lnPrior_loggaussian( theta ):
    mu, sigma, off = theta
    if (mu < 0) or (sigma <= 0):
        return -np.inf
    if (off < 0) or (off > 25):
        return -np.inf
    
    return -2*np.log(sigma)

def lnProb_loggaussian( theta, x ):
    temp = lnPrior_loggaussian( theta )
    if temp == -np.inf:
        return temp
    return temp + lnLike_loggaussian(theta, x)

pos = [np.array([7,2,10]) + 1e-2*np.random.randn(ndim) for i in range(nwalkers)]  # rough estimate

print("Fitting the adjusted BMI of hT aMab athletes to a Log-Gaussian distribution ...")

lnprob = None
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnProb_loggaussian, threads=os.cpu_count(), args=[ x ] )


model_mean = np.mean( pos, axis=0 )
print("{:6}: ({:5f}) mu={:7f}, sigma={:7f}, off={:7f}".format(0, \
                                lnProb_loggaussian( model_mean, x ), *model_mean))
for it in range(nsamples):
    global pos, lnprob
    
    pos, lnprob, _ = sampler.run_mcmc( pos, 64, storechain=False )

model_mean = np.mean( pos, axis=0 )
print("{:6}: ({:5f}) mu={:7f}, sigma={:7f}, off={:7f}".format((it+1) * 64, \
                                lnProb_loggaussian( model_mean, x ), *model_mean))
best = np.argmax( lnprob )
print("{:>5}: ({:5f}) mu={:7f}, sigma={:7f}, off={:7f}".format("ML", \
                lnprob[best], *pos[best]) )
model_median = np.median( pos, axis=0 )
print("{:6}: ({:5f}) mu={:7f}, sigma={:7f}, off={:7f}".format("median", \
                                lnProb_loggaussian( model_median, x ), *model_median))

models['hT_aMab_BMI_loggaussian'] = pos

Fitting the adjusted BMI of hT aMab athletes to a Log-Gaussian distribution ...
     0: (-629.141577) mu=6.999492, sigma=2.001107, off=10.000768
   384: (-290.910651) mu=3.812746, sigma=1.789607, off=16.633741
   ML: (-277.119315) mu=3.848383, sigma=1.818429, off=16.637382
median: (-288.278918) mu=3.795675, sigma=1.778238, off=16.637076

(Click here to show the code)

import scipy as sp

ndim = 3
def lnLike_gammaoffset(theta, x):
    alpha, beta, off = theta
    x_adj = x - off
    if np.any( x_adj < 0 ):
        return -np.inf
    return np.sum( alpha*np.log(beta) - sp.special.loggamma(alpha) + (alpha-1)*np.log(x_adj) - beta*x_adj )

lnPrior_gammaoffset = lnPrior_loggaussian   # the two are similar enough to reuse
def lnProb_gammaoffset( theta, x ):
    temp = lnPrior_gammaoffset( theta )
    if temp == -np.inf:
        return temp
    return temp + lnLike_gammaoffset(theta, x)

pos = [np.array([20,3.,10.]) + 1e-2*np.random.randn(ndim) for i in range(nwalkers)]
print("Fitting the adjusted BMI of hT aMab athletes to a Gamma distribution ...")

sampler = emcee.EnsembleSampler(nwalkers, ndim, lnProb_gammaoffset, threads=os.cpu_count(), args=[ x ] )

gammaoffset_mean = np.mean( pos, axis=0 )
print("{:5}: ({:5f}) alpha={:7f}, beta={:7f}, off={:7f}".format(0, 
                        lnProb_gammaoffset( gammaoffset_mean, x ), *gammaoffset_mean))

for it in range(nsamples):
    global pos, lnprob
    
    pos, lnprob, _ = sampler.run_mcmc( pos, 64, storechain=False )

model_mean = np.mean( pos, axis=0 )
print("{:6}: ({:5f}) alpha={:7f}, beta={:7f}, off={:7f}".format((it+1) * 64, \
                                lnProb_gammaoffset( model_mean, x ), *model_mean))
best = np.argmax( lnprob )
print("{:6}: ({:5f}) alpha={:7f}, beta={:7f}, off={:7f}".format("ML", \
                lnprob[best], *pos[best]) )
model_median = np.median( pos, axis=0 )
print("{:6}: ({:5f}) alpha={:7f}, beta={:7f}, off={:7f}".format("median", \
                                lnProb_gammaoffset( model_median, x ), *model_median))

models['hT_aMab_BMI_gamma'] = pos

Fitting the adjusted BMI of hT aMab athletes to a Gamma distribution ...
    0: (-564.227696) alpha=19.998389, beta=3.001330, off=9.999839
   384: (-256.999252) alpha=15.951361, beta=2.194827, off=13.795466
ML    : (-248.056301) alpha=8.610936, beta=1.673886, off=15.343436
median: (-249.115483) alpha=12.411010, beta=2.005287, off=14.410945

(Click here to show the code)

ndim = 3
def lnLike_weibulloffset(theta, x):
    k, beta, off = theta
    x_adj = x - off
    if np.any( x_adj < 0 ):
        return -np.inf
    return np.sum( np.log(k*beta) + (k-1)*np.log(x_adj*beta) - (x*beta)**k )

lnPrior_weibulloffset = lnPrior_loggaussian
def lnProb_weibulloffset( theta, x ):
    temp = lnPrior_weibulloffset( theta )
    if temp == -np.inf:
        return temp
    return temp + lnLike_weibulloffset(theta, x)

pos = [np.array([8,.1,1.]) + 1e-2*np.random.randn(ndim) for i in range(nwalkers)]
print("Fitting the adjusted BMI of hT aMab athletes to a Weibull distribution ...")

sampler = emcee.EnsembleSampler(nwalkers, ndim, lnProb_weibulloffset, threads=os.cpu_count(), args=[ x ] )

weibull_mean = np.mean( pos, axis=0 )
print("{:5}: ({:5f}) k={:7f}, beta={:7f}, off={:7f}".format(0,
            lnProb_weibulloffset( weibull_mean, x ), *weibull_mean))
for it in range(nsamples):
    global pos, lnprob
    
    pos, lnprob, _ = sampler.run_mcmc( pos, 64, storechain=False )
    
model_mean = np.mean( pos, axis=0 )
print("{:>5}: ({:5f}) k={:7f}, beta={:7f}, off={:7f}".format((it+1) * 64, \
                                lnProb_weibulloffset( model_mean, x ), *model_mean))
best = np.argmax( lnprob )
print("{:>5}: ({:5f}) k={:7f}, beta={:7f}, off={:7f}".format("ML", \
                lnprob[best], *pos[best]) )
model_median = np.median( pos, axis=0 )
print("{:>5}: ({:5f}) k={:7f}, beta={:7f}, off={:7f}".format("median", \
                                lnProb_weibulloffset( model_median, x ), *model_median))

models['hT_aMab_BMI_weibull'] = pos

Fitting the adjusted BMI of hT aMab athletes to a Weibull distribution ...
    0: (-48865.772268) k=7.999859, beta=0.099877, off=0.999138
  384: (-271.350390) k=9.937527, beta=0.046958, off=0.019000
   ML: (-270.340284) k=9.914647, beta=0.046903, off=0.000871
median: (-270.974131) k=9.833793, beta=0.046947, off=0.011727

(Click here to show the code)

ndim = 2
def lnLike_rayleighoffset(theta, x):
    tau, off = theta
    x_adj = x - off
    if np.any( x_adj < 0 ):
        return -np.inf
    return np.sum( np.log(x_adj*tau) - .5*x_adj*x_adj*tau )

def lnPrior_rayleighoffset( theta ):
    tau, off = theta
    if (tau <= 0) or (off < 0):
        return -np.inf
    return 0

def lnProb_rayleighoffset( theta, x ):
    temp = lnPrior_rayleighoffset( theta )
    if temp == -np.inf:
        return temp
    return temp + lnLike_rayleighoffset(theta, x)

pos = [np.array([.5,10]) + 1e-2*np.random.randn(ndim) for i in range(nwalkers)]
print("Fitting the adjusted BMI of hT aMab athletes to a Rayleigh distribution ...")
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnProb_rayleighoffset, threads=os.cpu_count(), args=[ x ] )

rayleigh_mean = np.mean( pos, axis=0 )
print("{:5}: ({:5f}) tau={:7f}, off={:7f}".format(0,
            lnProb_rayleighoffset( rayleigh_mean, x ), *rayleigh_mean))

for it in range(nsamples):
    global pos, lnprob
    
    pos, lnprob, _ = sampler.run_mcmc( pos, 64, storechain=False )
    
rayleigh_mean = np.mean( pos, axis=0 )
print("{:5}: ({:5f}) tau={:7f}, off={:7f}".format((it+1)*64,
            lnProb_rayleighoffset( rayleigh_mean, x ), *rayleigh_mean))
best = np.argmax( lnprob )
print("{:>5}: ({:5f}) tau={:7f}, off={:7f}".format("ML", lnprob[best], *pos[best]))
rayleigh_median = np.median( pos, axis=0 )
print("{:5}: ({:5f}) tau={:7f}, off={:7f}".format("median",
            lnProb_rayleighoffset( rayleigh_median, x ), *rayleigh_median))

models['hT_aMab_BMI_rayleigh'] = pos

Fitting the adjusted BMI of hT aMab athletes to a Rayleigh distribution ...
    0: (-3378.099000) tau=0.499136, off=9.999193
  384: (-254.717778) tau=0.107962, off=16.378780
   ML: (-253.012418) tau=0.110751, off=16.574934
median: (-253.092584) tau=0.108740, off=16.532576

(Click here to show the code)

minVal = min(BMI_adj[BMI_adj_valid])
maxVal = max(BMI_adj[BMI_adj_valid])
    
x = np.linspace(minVal,maxVal,255)


bins = 9

pp.hist( BMI_adj[mask_gt_5nmol & mask_amab & BMI_adj_valid], bins, density=1, facecolor='green', alpha=0.2)
pp.plot( x, plotWrapper(lnLike_gaussian, np.median(models['hT_aMab_BMI_gaussian'],axis=0), x ), 'k' )
pp.plot( x, plotWrapper(lnLike_loggaussian, np.median(models['hT_aMab_BMI_loggaussian'],axis=0), x ), 'r', alpha=.2 )
pp.plot( x, plotWrapper(lnLike_gammaoffset, np.median(models['hT_aMab_BMI_gamma'],axis=0), x ), 'g' )
pp.plot( x, plotWrapper(lnLike_weibulloffset, np.median(models['hT_aMab_BMI_weibull'],axis=0), x ), 'b', alpha=.2 )
pp.plot( x, plotWrapper(lnLike_rayleighoffset, np.median(models['hT_aMab_BMI_rayleigh'],axis=0), x ), 'y' )

pp.legend(['Gaussian','log-Gaussian + Offset','Gamma + Offset', 'Weibull + Offset','Rayleigh + Offset','high-T aMab'], loc=0)


pp.title('Adjusted BMI, elite athletes')
pp.xlim([minVal,maxVal])
pp.ylim([0,.3])
pp.yticks([])

pp.show()

Looks like the Gamma distribution is the best of the bunch, though only if you use the median or maximal likelihood of the posterior. There must be some outliers in there that are tugging the mean around. Visually, there isn't too much difference between the Gaussian and Gamma fits, but the Rayleigh seems artificially sharp on the low end. It's a bit of a shame, the Gamma distribution is usually related to rates and variance so we don't have a good reason for applying it here, other than "it fits the best." We might be able to do better with a per-sport Gaussian distribution fit, but for now I'm happy with the Gamma.

Time to fit the other pool of athletes, and chart it all.

(Click here to show the code)

x = BMI_adj[mask_lt_5nmol & BMI_adj_valid & mask_afab]

ndim = 3
pos = [np.array([20,3.,10.]) + 1e-2*np.random.randn(ndim) for i in range(nwalkers)]
print("Fitting the adjusted BMI of lT aFab athletes to a Gamma distribution ...")

sampler = emcee.EnsembleSampler(nwalkers, ndim, lnProb_gammaoffset, threads=os.cpu_count(), args=[ x ] )

gammaoffset_mean = np.mean( pos, axis=0 )
print("{:5}: ({:5f}) alpha={:7f}, beta={:7f}, off={:7f}".format(0, 
                        lnProb_gammaoffset( gammaoffset_mean, x ), *gammaoffset_mean))

for it in range(nsamples):
    global pos, lnprob
    
    pos, lnprob, _ = sampler.run_mcmc( pos, 64, storechain=False )

model_mean = np.mean( pos, axis=0 )
print("{:6}: ({:5f}) alpha={:7f}, beta={:7f}, off={:7f}".format((it+1) * 64, \
                                lnProb_gammaoffset( model_mean, x ), *model_mean))
best = np.argmax( lnprob )
print("{:6}: ({:5f}) alpha={:7f}, beta={:7f}, off={:7f}".format("ML", \
                lnprob[best], *pos[best]) )
model_median = np.median( pos, axis=0 )
print("{:6}: ({:5f}) alpha={:7f}, beta={:7f}, off={:7f}".format("median", \
                                lnProb_gammaoffset( model_median, x ), *model_median))

models['lT_aFab_BMI_gamma'] = pos

Fitting the adjusted BMI of lT aFab athletes to a Gamma distribution ...
    0: (-127.467934) alpha=20.000007, beta=3.000116, off=9.999921
   384: (-128.564564) alpha=15.481265, beta=3.161022, off=12.654149
ML    : (-117.582454) alpha=2.927721, beta=1.294851, off=14.713479
median: (-120.689425) alpha=11.961847, beta=2.836153, off=13.008723

(Click here to show the code)

x = np.linspace(minVal,maxVal,255)


bins = 9

pp.hist( BMI_adj[mask_lt_5nmol & mask_afab & BMI_adj_valid], bins, density=1, facecolor='black', alpha=0.2)
pp.hist( BMI_adj[mask_gt_5nmol & mask_amab & BMI_adj_valid], bins, density=1, facecolor='green', alpha=0.2)
pp.legend(['lT aFab','hT aMab'], loc=0)
pp.plot( x, plotWrapper(lnLike_gammaoffset, np.median(models['lT_aFab_BMI_gamma'],axis=0), x ), 'k', alpha=.5 )
pp.plot( x, plotWrapper(lnLike_gammaoffset, np.median(models['hT_aMab_BMI_gamma'],axis=0), x ), 'g', alpha=.5 )

pp.legend(['Gamma + Offset, lT aFab', 'Gamma + Offset, hT aMab','low-T aFab','high-T aMab'], loc=0)

pp.title('Adjusted BMI, elite athletes')
pp.xlim([minVal,maxVal])
pp.yticks([])

pp.show()

Those models look pretty reasonable, though the upper end of the assigned-female distribution could be improved on. It's a good enough fit to get some answers, at least.

The Nitty Gritty ¶

It's easier to combine step 3, applying the model, with step 5, calculating the Bayes Factor, when writing the code. The resulting Bayes Factor has a probability distribution, as the uncertainty contained in the posterior contaminates it.

(Click here to show the code)

logBF_gt5_afab_height = list()

gt5_afab_height = dataset['height'][mask_gt_5nmol & height_valid & mask_afab]

numer_gt5_afab_height = [lnProb_gaussian( pos, gt5_afab_height ) for pos in models['lT_aFab_height']]
denom_gt5_afab_height = [lnProb_gaussian( pos, gt5_afab_height ) for pos in models['hT_aMab_height']]

for laf in numer_gt5_afab_height:
    for ham in denom_gt5_afab_height:
        logBF_gt5_afab_height.append( laf-ham )

print("Summary of the BF distribution, for the height of >5nmol/L aFab athletes")
        
percentiles = np.percentile( logBF_gt5_afab_height, [5,16,50,84,95] )
geo_mean = np.exp(np.mean( logBF_gt5_afab_height ))

temp = np.exp(logBF_gt5_afab_height)
mean = np.mean( temp )
favour = np.sum( temp > 1 ) / len(temp)
decisive = np.sum( temp > 19 ) / len(temp)

print("{:>10} {:>10} {:>10} {:>10} {:>10} {:>10} {:>10} {:>10}".format(
    "n","mean","geo.mean","5%","16%","50%","84%","95%"))
print("{:>10} {:10.2f} {:10.2f} {:10.2f} {:10.2f} {:10.2f} {:10.2f} {:10.2f}".format(
    len(gt5_afab_height), mean, geo_mean, *np.exp(percentiles) ))
print()
print("Percentage of BF's that favoured the primary hypothesis: {:.2f}%".format(favour*100.))
print("Percentage of BF's that were 'decisive': {:.2f}%".format(decisive*100.))

bins = 21
pp.hist( logBF_gt5_afab_height, bins, density=1, facecolor='black', alpha=0.2)
pp.legend(['high-T aFab'], loc=0)

pp.axvline( x=percentiles[2], color='r' )
pp.axvline( x=percentiles[1], color='r', alpha=.5 )
pp.axvline( x=percentiles[3], color='r', alpha=.5 )
pp.axvline( x=percentiles[0], color='r', alpha=.1 )
pp.axvline( x=percentiles[4], color='r', alpha=.1 )

pp.xticks( np.linspace(-2,6,5), ["{:.1f}".format(x) for x in np.exp(np.linspace(-2,6,5))] )
pp.yticks([])
pp.title('Bayes factor, height, >5nmol/L aFab athletes')
pp.show()

Summary of the BF distribution, for the height of >5nmol/L aFab athletes
         n       mean   geo.mean         5%        16%        50%        84%        95%
        19      10.64       5.44       0.75       1.76       5.66      17.33      35.42

Percentage of BF's that favoured the primary hypothesis: 92.42%
Percentage of BF's that were 'decisive': 14.17%

That looks a lot like a log-Gaussian distribution. The arthithmetic mean fails us here, thanks to the huge range of values, so the geometric mean and median are better measures of central tendency.

The best way I can interpret this result is via an eight-sided die: our credence in the hypothesis that >5nmol/L aFab athletes are more like their >5nmol/L aMab peers than their <5nmol/L aFab ones is similar to the credence we'd place on rolling a one via that die, while our credence on the primary hypothesis is similar to rolling any other number except one. About 92% of the calculated Bayes Factors were favourable to the primary hypothesis, and about 16% of them crossed the 19:1 threshold, a close match for the asserted evidential bar in science.

That's strong evidence for a mere 19 athletes, though not quite conclusive. How about the Bayes Factor for the height of <5nmol/L aMab athletes?

(Click here to show the code)

logBF_lt5_amab_height = list()

lt5_amab_height = dataset['height'][mask_lt_5nmol & height_valid & mask_amab]

numer_lt5_amab_height = [lnProb_gaussian( pos, lt5_amab_height ) for pos in models['hT_aMab_height']]
denom_lt5_amab_height = [lnProb_gaussian( pos, lt5_amab_height ) for pos in models['lT_aFab_height']]

for laf in numer_lt5_amab_height:
    for ham in denom_lt5_amab_height:
        logBF_lt5_amab_height.append( laf-ham )

print("Summary of the BF distribution, for the height of <5nmol/L aMab athletes") percentiles = np.percentile( logBF_lt5_amab_height, [5,16,50,84,95] ) geo_mean = np.exp(np.mean( logBF_lt5_amab_height )) temp = np.exp(logBF_lt5_amab_height) mean = np.mean( temp ) favour = np.sum( temp > 1 ) / len(temp)
decisive = np.sum( temp > 19 ) / len(temp)

print("{:>10} {:>10} {:>10} {:>10} {:>10} {:>10} {:>10} {:>10}".format(
    "n","mean","geo.mean","5%","16%","50%","84%","95%"))
print("{:>10} {:10.2e} {:10.2e} {:10.2e} {:10.2e} {:10.2e} {:10.2e} {:10.2e}".format(
    len(lt5_amab_height), mean, geo_mean, *np.exp(percentiles) ))
print()
print("Percentage of BF's that favoured the primary hypothesis: {:.2f}%".format(favour*100.))
print("Percentage of BF's that were 'decisive': {:.2f}%".format(decisive*100.))

bins = 21
pp.hist( logBF_lt5_amab_height, bins, density=1, facecolor='black', alpha=0.2)
pp.legend(['low-T aMab'], loc=0)

pp.axvline( x=percentiles[2], color='r' )
pp.axvline( x=percentiles[1], color='r', alpha=.5 )
pp.axvline( x=percentiles[3], color='r', alpha=.5 )
pp.axvline( x=percentiles[0], color='r', alpha=.1 )
pp.axvline( x=percentiles[4], color='r', alpha=.1 )

pp.xticks( np.linspace(30,55,7), ["{:.1e}".format(x) for x in np.exp(np.linspace(30,55,7))], rotation='vertical' )
pp.yticks([])
pp.title('Bayes factor, height, <5nmol/L aMab athletes')
pp.show()

Summary of the BF distribution, for the height of <5nmol/L aMab athletes
         n       mean   geo.mean         5%        16%        50%        84%        95%
        26   4.67e+21   3.49e+18   5.67e+14   2.41e+16   5.35e+18   4.16e+20   4.61e+21

Percentage of BF's that favoured the primary hypothesis: 100.00%
Percentage of BF's that were 'decisive': 100.00%

Wow! Even with 26 data points, our primary hypothesis was extremely well supported. Betting against that hypothesis is like betting a particular person in the US will be hit by lightning three times in a single year!

That seems a little too favourable to my view, though. Did something go wrong with the mathematics? The simplest check is to graph the models against the data they're evaluating.

(Click here to show the code)

minVal = min(dataset['height'][height_valid])
maxVal = max(dataset['height'][height_valid])
x = np.linspace(minVal,maxVal,255)


bins = 8
pp.hist( lt5_amab_height, bins, density=1, facecolor='green', alpha=0.2)

pp.plot( x, plotWrapper(lnLike_gaussian, np.mean(models['lT_aFab_height'],axis=0), x ), 'k' )
pp.plot( x, plotWrapper(lnLike_gaussian, np.mean(models['hT_aMab_height'],axis=0), x ), 'g' )

pp.legend(['low-T aFab', 'high-T aMab', 'low-T aMab'], loc=0)

pp.yticks([])
# pp.ylim([0,.45])
pp.title('Height, elite athletes, <5nmol/L aMab athletes')
pp.show()

Nope, the underlying data genuinely is a better fit for the high-testosterone aMab model. But that good of a fit? In linear space, we multiply each of the individual probabilities to arrive at the Bayes factor. That's equivalent to raising the geometric mean to the nth power, where n is the number of athletes. Since n = 26 here, even a geometric mean barely above one can generate a big Bayes factor.

(Click here to show the code)

temp = [lnProb_gaussian( np.median(models['hT_aMab_height'],axis=0), x ) - \
       lnProb_gaussian( np.median(models['lT_aFab_height'],axis=0), x ) for x in lt5_amab_height]

print( "{}th root of the median Bayes factor of the high-T aMab model applied to low-T aMab athletes: {:.4f}".format(len(lt5_amab_height), \
                                                            np.exp(percentiles[2]/len(temp))) )
print( "{}th root of the Bayes factor for the median marginal: {:.4f}".format(len(lt5_amab_height), \
                                                            np.exp(np.mean(temp))) )

26th root of the median Bayes factor of the high-T aMab model applied to low-T aMab athletes: 5.2519
26th root of the Bayes factor for the median marginal: 3.6010

Note that the Bayes factor we generate by using the median of the marginal for each parameter isn't as strong as the median Bayes factor in the above convolution. That's simply because I'm using a small sample from the posterior distribution. Keeping more samples would have brought those two values closer together, but also greatly increased the amount of computation I needed to do to generate all those Bayes factors.

With that check out of the way, we can move on to $\overline{BMI}$.

(Click here to show the code)

logBF_gt5_afab_BMI = list()
gt5_afab_BMI_invalid = [0,0]

gt5_afab_BMI = BMI_adj[mask_gt_5nmol & BMI_adj_valid & mask_afab]
numer_gt5_afab_BMI = [lnProb_gammaoffset( pos, gt5_afab_BMI ) for pos in models['lT_aFab_BMI_gamma']]
denom_gt5_afab_BMI = [lnProb_gammaoffset( pos, gt5_afab_BMI ) for pos in models['hT_aMab_BMI_gamma']]

for laf in numer_gt5_afab_BMI:
    for ham in denom_gt5_afab_BMI:
        if not np.isfinite(laf):
            gt5_afab_BMI_invalid[0] += 1
            continue
        elif not np.isfinite(ham):
            gt5_afab_BMI_invalid[1] += 1
            continue
        else:
            logBF_gt5_afab_BMI.append( laf-ham )
        
print("Summary of the BF distribution, for the adjusted BMI of >5nmol/L aFab athletes")
        
percentiles = np.percentile( logBF_gt5_afab_BMI, [5,16,50,84,95] )
geo_mean = np.exp( np.mean(logBF_gt5_afab_BMI) )

temp = np.exp(logBF_gt5_afab_BMI)
mean = np.mean( temp )
favour = np.sum( temp > 1 ) / len(temp)
decisive = np.sum( temp > 19 ) / len(temp)

print("{:>10} {:>10} {:>10} {:>10} {:>10} {:>10} {:>10} {:>10}".format(
    "n","mean","geo.mean","5%","16%","50%","84%","95%"))
print("{:>10} {:10.2e} {:10.2e} {:10.2e} {:10.2e} {:10.2e} {:10.2e} {:10.2e}".format(
    len(gt5_afab_BMI), mean, geo_mean, *np.exp(percentiles) ))
print()
print("Percentage of BF's that favoured the primary hypothesis: {:.2f}%".format(favour*100.))
print("Percentage of BF's that were 'decisive': {:.2f}%".format(decisive*100.))
print("Percentage of non-finite probabilities, when applying the low-T aFab model to high-T aFab athletes: {:.2f}%".format(\
                gt5_afab_BMI_invalid[0]*100./(sum(gt5_afab_BMI_invalid)+len(logBF_gt5_afab_BMI)) ))
print("Percentage of non-finite probabilities, when applying the high-T aMab model to high-T aFab athletes: {:.2f}%".format(\
                gt5_afab_BMI_invalid[1]*100./(sum(gt5_afab_BMI_invalid)+len(logBF_gt5_afab_BMI)) ))

bins = 20
pp.hist( logBF_gt5_afab_BMI, bins, density=1, facecolor='black', alpha=0.2)
pp.legend(['high-T aFab'], loc=0)

pp.axvline( x=percentiles[2], color='r' )
pp.axvline( x=percentiles[1], color='r', alpha=.5 )
pp.axvline( x=percentiles[3], color='r', alpha=.5 )
pp.axvline( x=percentiles[0], color='r', alpha=.1 )
pp.axvline( x=percentiles[4], color='r', alpha=.1 )

pp.xticks( np.linspace(0,50,7), ["{:.1e}".format(x) for x in np.exp(np.linspace(0,50,7))], rotation='vertical' )
pp.yticks([])
pp.title('Bayes factor, BMI, >5nmol/L aFab athletes')
pp.show()

Summary of the BF distribution, for the adjusted BMI of >5nmol/L aFab athletes
         n       mean   geo.mean         5%        16%        50%        84%        95%
         4   1.70e+12   1.06e+05   2.31e+02   1.60e+03   4.40e+04   3.66e+06   3.99e+09

Percentage of BF's that favoured the primary hypothesis: 100.00%
Percentage of BF's that were 'decisive': 99.53%
Percentage of non-finite probabilities, when applying the low-T aFab model to high-T aFab athletes: 0.00%
Percentage of non-finite probabilities, when applying the high-T aMab model to high-T aFab athletes: 10.94%

This distribution is much stranger, with a number of extremely high BF's that badly skew the mean. The offset contributes to this, with 7-12% of the model posteriors for high-T aMab athletes assigning a zero percent likelihood to an adjusted BMI. Those are excluded from the analysis, but they suggest the high-T aMab model poorly describes high-T aFab athletes.

Our credence in the primary hypothesis here is similar to our credence that an elite golfer will not land a hole-in-one on their next shot. That's surprisingly strong, given we're only dealing with four datapoints. More data may water that down, but it's unlikely to overcome that extreme level of credence.

(Click here to show the code)

logBF_lt5_amab_BMI = list()
lt5_amab_BMI_invalid = [0,0]

lt5_amab_BMI = BMI_adj[mask_lt_5nmol & BMI_adj_valid & mask_amab]
numer_lt5_amab_BMI = [lnProb_gammaoffset( pos, lt5_amab_BMI ) for pos in models['hT_aMab_BMI_gamma']]
denom_lt5_amab_BMI = [lnProb_gammaoffset( pos, lt5_amab_BMI ) for pos in models['lT_aFab_BMI_gamma']]

for laf in numer_lt5_amab_BMI:
    for ham in denom_lt5_amab_BMI:
        if not np.isfinite(laf):
            lt5_amab_BMI_invalid[0] += 1
            continue
        elif not np.isfinite(ham):
            lt5_amab_BMI_invalid[1] += 1
            continue
        else:
            logBF_lt5_amab_BMI.append( laf-ham )
        
print("Summary of the BF distribution, for the adjusted BMI of <5nmol/L aMab athletes") percentiles = np.percentile( logBF_lt5_amab_BMI, [5,16,50,84,95] ) geo_mean = np.exp( np.mean(logBF_lt5_amab_BMI) ) temp = np.exp(logBF_lt5_amab_BMI) mean = np.mean( temp ) favour = np.sum( temp > 1 ) / len(temp)
decisive = np.sum( temp > 19 ) / len(temp)

print("{:>10} {:>10} {:>10} {:>10} {:>10} {:>10} {:>10} {:>10}".format(
    "n","mean","geo.mean","5%","16%","50%","84%","95%"))
print("{:>10} {:10.2e} {:10.2e} {:10.2e} {:10.2e} {:10.2e} {:10.2e} {:10.2e}".format(
    len(lt5_amab_BMI), mean, geo_mean, *np.exp(percentiles) ))
print()
print("Percentage of BF's that favoured the primary hypothesis: {:.2f}%".format(favour*100.))
print("Percentage of BF's that were 'decisive': {:.2f}%".format(decisive*100.))
print("Percentage of non-finite probabilities, when applying the high-T aMab model to low-T aMab athletes: {:.2f}%".format(\
                lt5_amab_BMI_invalid[0]*100./(sum(lt5_amab_BMI_invalid)+len(logBF_lt5_amab_BMI)) ))
print("Percentage of non-finite probabilities, when applying the low-T aFab model to low-T aMab athletes: {:.2f}%".format(\
                lt5_amab_BMI_invalid[1]*100./(sum(lt5_amab_BMI_invalid)+len(logBF_lt5_amab_BMI)) ))

bins = 20
pp.hist( logBF_lt5_amab_BMI, bins, density=1, facecolor='black', alpha=0.2)
pp.legend(['low-T aMab'], loc=0)

pp.axvline( x=percentiles[2], color='r' )
pp.axvline( x=percentiles[1], color='r', alpha=.5 )
pp.axvline( x=percentiles[3], color='r', alpha=.5 )
pp.axvline( x=percentiles[0], color='r', alpha=.1 )
pp.axvline( x=percentiles[4], color='r', alpha=.1 )

pp.xticks( np.linspace(20,100,10), ["{:.1e}".format(x) for x in np.exp(np.linspace(20,100,10))], rotation='vertical' )
pp.yticks([])
pp.title('Bayes factor, BMI, <5nmol/L aMab athletes')
pp.show()

Summary of the BF distribution, for the adjusted BMI of <5nmol/L aMab athletes
         n       mean   geo.mean         5%        16%        50%        84%        95%
         9   6.64e+35   2.07e+22   4.05e+12   4.55e+16   6.31e+21   7.72e+27   9.81e+32

Percentage of BF's that favoured the primary hypothesis: 100.00%
Percentage of BF's that were 'decisive': 100.00%
Percentage of non-finite probabilities, when applying the high-T aMab model to low-T aMab athletes: 0.00%
Percentage of non-finite probabilities, when applying the low-T aFab model to low-T aMab athletes: 0.00%

The hypotheses' Bayes factor for the adjusted BMI of low-testosterone aMab athletes is much better behaved. Even here, the credence is above three-lightning-strikes territory, pretty decisively favouring the hypothesis.

Our final step would normally be to combine all these individual Bayes factors into a single one. That involves multiplying them all together, however, and a small number multiplied by a very large one is an even larger one. It isn't worth the effort, the conclusion is pretty obvious.

Truth and Consequences ¶

Our primary hypothesis is on quite solid ground: Athletes with exceptional testosterone levels are more like athletes of the same sex but with typical testosterone levels, than they are of other athletes with a different sex but similar testosterone levels. If we divide up sports by testosterone level, then, roughly 6-8% of assigned-male athletes will wind up in the <5 nmol/L group, and about the same share of assigned-female athletes will be in the >5 nmol/L group. Note, however, that it doesn't follow that 6-8% of those in the <5 nmol/L group will be assigned-male. About 41% of the athletes at the 2018 Olymics were assigned-female, for instance. If we fix the rate of exceptional testosterone levels at 7%, and assume PyeongChang's rate is typical, a quick application of Bayes' theorem reveals

$$ \begin{align} p( \text{aMab} \mid \text{<5nmol/L} ) &= \frac{ p( \text{<5nmol/L} \mid \text{aMab} ) p( \text{aMab} ) }{ p( \text{<5nmol/L} \mid \text{aMab} ) p( \text{aMab} ) + p( \text{<5nmol/L} \mid \text{aFab} ) p( \text{aFab} ) } \\ {} &= \frac{ 0.07 \cdot 0.59 }{ 0.07 \cdot 0.59 + 0.93 \cdot 0.41 } \\ {} &\approx 9.8\% \end{align} $$

If all those assumptions are accurate, about 10% of <5 nmol/L athletes will be assigned-male, more-or-less matching the number I calculated way back at the start. In sports where performance is heavily correlated with height or $\overline{BMI}$, then, the 10% of assigned-male athletes in the <5 nmol group will heavily dominate the rankings. The odds of a woman earning recognition in this sport are negligible, leading many of them to drop out. This increases the proportion of men in that sport, leading to more domination of the rankings, more women dropping out, and a nasty feedback loop.

Conversely, about 5% of >5nmol/L athletes will be assigned-female. In a heavily-correlated sport, those women will be outclassed by the men and have little chance of earning recognition for their achievements. They have no incentive to compete, so they'll likely drop out or avoid these sports as well.

In events where physicality has less or no correlation with sporting performance, these effects will be less pronounced or non-existent, of course. But this still translates into fewer assigned-female athletes competing than in the current system.

But it gets worse! We'd also expect an uptick in the number of assigned-female athletes doping, primarily with testosterone inhibitors to bring themselves just below the 5nmol/L line. Alternatively, high-testosterone aFab athletes may inject large doses of testosterone to bulk up and remain competitive with their assigned-male competitors.

By dividing up testosterone levels into only two categories, sporting authorities are implicitly stating that everyone within those categories is identical. A number of athletes would likely go to court to argue that boosting or inhibiting testosterone should be legal, provided they do not cross the 5nmol/L line. If they're successful, then either the rules around testosterone usage would be relaxed, or sporting authorities would be forced to subdivide these groups further. This would lead to an uptick in testosterone doping among all athletes, not just those assigned female.

Notice that assigned-male athletes don't have the same incentives to drop out, and in fact the low-testosterone subgroup may even be encouraged to compete as they have an easier path to sporting fame and glory. Sports where performance is heavily correlated with height or $\overline{BMI}$ will come to be dominated by men.

Let's Put a Bow On This One ¶

[1:15] In a nutshell, I find the arguments and logic that currently permit transgender women to compete against biological women to be remarkably flawed, and I’m convinced that unless quickly rectified, this will KILL women’s sports.

[14:00] I don’t want to see the day when women’s athletics is dominated by Y chromosomes, but without a change in policy, that is precisely what’s going to happen.

It's rather astounding. Transgender athletes are a not a problem, on several levels; as I've pointed out before, they've been allowed to compete in the category they identify for over a decade in some places, and yet no transgender athlete has come to dominate any sport. The Olympics has held the door open since 2004, and not a single transgender athlete has ever openly competed as a transgender athlete. Rationality Rules, like other transphobes, is forced to cherry-pick and commit lies of omission among a handful of examples, inflating them to seem more significant than they actually are.

In response to this non-existent problem, Rationality Rules' proposed solution would accomplish the very thing he wants to avoid! You don't get that turned around if you're a rational person with a firm grasp on the science.

No, this level of self-sabotage is only possible if you're a clueless bigot who's ignorant of the relevant science, and so frightened of transgender people that your critical thinking skills abandon you. The vast difference between what Rationality Rules claims the science says, and what his own citations say, must be because he knows that if he puts on a good enough act nobody will check his work. Everyone will walk away assuming he's rational, rather than a scared, dishonest loon.

It's hard to fit any other conclusion to the data.

Rationality Rules Is Delusional

I glossed past something in my last post. Emphasis mine:

[9:18] You see, I absolutely understand why we have and still do categorize sports based upon sex, as it’s simply the case that the vast majority of males have significant athletic advantages over females, but strictly speaking it’s not due to their sex. It’s due to factors that heavily correlate with their sex, such as height, width, heart size, lung size, bone density, muscle mass, muscle fiber type, hemoglobin, and so on. Or, in other words, sports are not segregated due to chromosomes, they’re segregated due to morphology.

I think it’s time we had a look at his science on this. Of the eleven scientific studies I counted in RR’s citations, only two dealt with muscle fibre composition:

Oertel, Gisela. “Morphometric Analysis of Normal Skeletal Muscles in Infancy, Childhood and Adolescence: An Autopsy Study.” Journal of the Neurological Sciences 88, no. 1 (December 1, 1988): 303–13. https://doi.org/10.1016/0022-510X(88)90227-4.

Staron, Robert S., Fredrick C. Hagerman, Robert S. Hikida, Thomas F. Murray, David P. Hostler, Mathew T. Crill, Kerry E. Ragg, and Kumika Toma. “Fiber Type Composition of the Vastus Lateralis Muscle of Young Men and Women.” Journal of Histochemistry & Cytochemistry 48, no. 5 (May 2000): 623–29. https://doi.org/10.1177/002215540004800506.

From that, we can extract the key charts on fibre composition. I’ll dim the irrelevant sections. [Read more…]

“Rationality Rules STILL Doesn’t Understand Sports”

Picking apart Rationality Rules’ science has been well covered, both by myself and by others, so it’s refreshing to watch someone take an entirely different approach.

[9:54] They outlawed dunking the basketball, because Kareem Abdul-Jabbar won too many championships doing it. In nearly every possible example of a rule change in relation to individuals dominating, it comes only after that individual… well, dominated.

[10:16] In conclusion – to steal an ending as well – sport is not defined by fairness of starting point. If it was, we wouldn’t love sports. Sports are about the adversity, about overcoming the odds. It’s not about bleaching them into a robotic simulation in a computer.

Xevaris’ critique is more about the fundamental character of sport, like what it means to compete, and delves deep into history. It’s worth your time. I also want to point you to it because I cover similar territory in an upcoming post.

I really only have one complaint: there’s no closed-captions! There are plenty of reasons to keep them enabled on your videos, YouTubers.

Rationality Rules is a “Lying” Transphobe

There’s a reason why Rationality Rules keeps referencing EssenceOfThought; on YouTube, they‘re by far his highest-profile critic, and have done the most comprehensive critique. Other YouTubers haven’t been silent, though, and today I’d like to highlight one of them. Rhetoric&Discourse do an excellent job of summing up one specific example of dishonest behavior. I’ll only quote their conclusion here:

[11:18] Stephen comes to this conclusion in the same way that he did the last time: comparing cis men to cis women, and by ignoring the actual literature that compares trans women to cis women. To make matters worse, Stephen lies about the content of the studies in order to push his anti-trans agenda. Stephen claimed that he understood what his biggest mistake was – comparing cis men to cis women, and concluding that this comparison applied to trans women who had undergone HRT – but this new video only shows that he either didn’t understand or he cares more about bashing on trans individuals than he cares about intellectual honesty.

Ouch. Go and watch the video to get the full argument. In the meantime, I’ll point out that out of RR’s nearly forty sources used for his latest video, only two were scientific studies of transgender athletes, and only one of those actually evaluated their performance!

Gooren, Lj, and Mc Bunck. “Transsexuals and Competitive Sports.” European Journal of Endocrinology, October 1, 2004, 425–29. https://doi.org/10.1530/eje.0.1510425.

Harper, J. “Race Times for Transgender Athletes.” Journal of Sporting Cultures and Identities 6, no. 1 (2015): 1–9.

Their conclusions?

The pivotal question is this: can reassigned transsexuals compete fairly with others of their new sex? Our data are limited and do not provide insight into all pertinent aspects. In competitive sports, in all likelihood, small differences may be critical for winning or losing. Our analysis is not refined enough to detect these small differences, allowing only an approximation. As far as our data allow conclusions, the answer for F –M is probably yes, provided the administration of testosterone has not generated and does not generate supraphysiological testosterone levels, as these levels and exercise induce a surplus in muscle mass over exercise alone (…). For M –F, there is an element of arbitrariness. There is no conclusive evidence pro or con that the prenatal/perinatal testosterone exposure of men has an impact on future physical traits. […] In real life, there will always be an element of arbitrariness in the drawing of competitive lines. Different individuals are born with and develop postnatally different potentials. The caprices of genetics and postnatal development will make any form of competition intrinsically unfair at some level. In the studies of Bhasin and colleagues (…), changes in muscle size correlated with testosterone dose and concentration. These correlations were established in groups of men receiving graded doses of testosterone. Nevertheless, there was considerable heterogeneity in response to testosterone administration within each group of men receiving the same amount of testosterone. These individual differences in response to androgen administration might reflect differences in activity level, testosterone metabolism and nutrition, or polymorphisms in androgen receptor, myostatin, 5a-reductase or other muscle growth regulators, all genetically determined and inherently personal. The implication is that all men and women are not born equally endowed for competition in sports.

(Gooren 2004)

Despite the fact that transgender women have been allowed to compete against cisgender ones since 2004, there has been no study used to justify this decision beyond the original work of Gooren and Bunck. It bears repeating that this original study was not undertaken on athletes, nor did it directly measure any aspect of athleticism. In fact, this is the first time a study has been developed to measure the performance of transgender athletes. […]

The author chose to use the standard age-grading methodology which is commonly used in master’s (over forty) track meets worldwide, to evaluate the performance of eight distance runners who had undergone gender transition from male to female. As a group, the eight study participants had remarkably similar age grade scores in both male and female gender, making it possible to state that transgender women run distance races at approximately the same level, for their respective gender, both before and after gender transition.

It should be noted that this conclusion only applies to distance running and the author makes no claims as to the equality of performances, pre and post gender transition, in any other sport. As such, the study cannot, unequivocally, state that it is fair to allow to transgender women to compete against 46,XX women in all sports, although the study does make a powerful statement in favor of such a position.

(Harper 2015)

Compare this with what Rationality Rules concludes.

[19:07] In some events – such as long-distance running, in which hemoglobin and slow-twitch muscle fibers are vital – I think there’s a strong argument to say no, [transgender women who transitioned after puberty] don’t have an unfair advantage, as the primary attributes are sufficiently mitigated. But in most events, and especially those in which height, width, hip size, limb length, muscle mass, and muscle fiber type are the primary attributes – such as weightlifting, sprinting, hammer throw, javelin, netball, boxing, karate, basketball, rugby, judo, rowing, hockey, and many more – my answer is yes, most do have an unfair advantage. There are exceptions, of course, just as there are exceptions to everything, but far more often than not their TUE in such events does not bring them down to anywhere near the 100% mark. […]

[20:09] I am opposed [to these athlete’s inclusion] because the attributes which are granted from male puberty that play a vital role in some events have not been shown to be sufficiently mitigated by HRT.

I dunno about you, but I find it comforting that the only way you can justify excluding transgender athletes is to misrepresent the relevant science.

Rationality Rules is “A Transphobic Hack”

Looks like my initial assessment of Rationality Rules’ second attempt at transgender athletes got it right.

I just want to start this video noting a very simple fact. Whilst Stephen Woodford’s latest video is over 21 minutes long, when I accounted for arguments already refuted, the new content only amounted to just 6:34. What’s more is that said new content contains zero arguments. It’s purely him dishonestly framing his opposition and the example he asks us to keep in mind as he opens his video.

The only two arguments he makes in his video are the bait and switch I dealt in my original response. And an attempt to justify this by shirking the burden of proof, something I dealt with in my response to Woodford’s ‘Mistakes of Many’ video.

Think about that: RR had two months to research counter-arguments and strengthen his stance, and instead chose to ignore all his critics and push the same arguments. The only changes he made were to move the goalposts. As one example, the original video contained these statements:

I’m convinced that, unless quickly rectified, [the inclusion of transgender women] will quickly kill women’s sport.

I don’t want to see the day when women’s athletics is dominated by Y chromosomes, but without a change in policy that is precisely what is going to happen.

He has never acknowledged those statements in any subsequent video, nor apologized for them. By removing them from the public record, though, he makes his stance look more reasoned. Since his opponents haven’t removed their critiques, though, they look like they’re overreacting.

Add in his now-usual tactic of dishonest editing to make his opponents’ views appear weaker than they are, and the new tactic of relying on talking points from religious far-Right organizations that joke about transgender people suffering painful deaths, and Rationality Rules’ replacement video is actually worse than the original!

How could he do something like this? Easy.

[48:55] How much damage does Woodford have to do to both trans people and the secular community before those who have been sitting on their hands, claiming we need to just give him time, finally take a stand? So rather than me ending by asking you questions, I’d like to offer a request. Start questioning the various content creators in the secular community as to why they still remain silent on the subject.

Because the only way we’re going to fix the secular community is if we actually begin holding its members accountable. People have asked me to consider how my attempts to hold Woodford accountable look to outsiders. Well can you?. How can we judge religious institutions for failing to tackle internal issues, whilst we see a coordinated effort to police marginalised voices in the secular community? My actions are not what makes the secular community look bad.

Rationality Rules knows he will not be held accountable for his dishonesty and harm. The Atheist Community of Austin tried to do a mild accounting, but was forced to back off due to public backlash from the community and a few high-profile members like Matt Dillahunty and AronRa who reflexively backed RR. And among high-profile groups and individuals, that’s it.

The message of the atheist/skeptic community is loud and clear: they will give your dishonesty and bigotry a pass if you’re popular enough and give the superficial appearance of caring about rational discourse. If you’re wondering why I continue to devote so much of my spare time to critiquing RR’s videos, it’s because I strongly disagree with the consensus of my community and I want it to change.

I should confess, however, that if you’d asked my the “why?” question a few weeks ago, I would have instead said that I dislike it when someone promotes misinformation, doubly dislike it when that person uses their rhetorical skills to make it tougher to respond, and triply dislike it when that person shares a community with me. My own thoughts have evolved thanks to Peter/Ethel of EssenceOfThought, and the time and effort they’ve put into critiquing RR. The quotes I’ve pulled from their latest video really don’t do it justice, I strongly recommend you watch the full thing.

And while doing so, think about how you’d like this community to behave.

[HJH 2019-06-23: Added a link to Matt Dillahunty’s tweet.]

[HJH 2019-06-23: Also added a link to EssenceOfThought’s summary of what happened to the ACA immediately after publishing their original statement.]

Rationality Rules is an Irrational Transphobe

The first rule of rational debate is to argue against the strongest argument your opponent has. Anyone can make their opponents’ position look weak by cherry-picking the worst of their points. In fact, if possible you should strengthen your opponent’s arguments before refuting them. “Steelmanning” has practically become a sacrament within the atheist/skeptic community, because we realize the value it brings to rational debate.

With that in mind, look at the title of Rationality Rules’ latest video: “Do Transgender Athletes Have an Unfair Advantage?” I already refuted the key premise of that six months ago!

This argument should be the focus when discussing trans athletes. It doesn’t matter if every single one of them are fifty feet tall, what only matters is if you accept the existence of gender dysphoria as at least partly grounded in biology. If so, then the above argument demands you let them compete in the gender category they identify with. If that leads to situations you think are unfair, then you shouldn’t be using gender as a proxy for athletic ability, instead relying on metrics like muscle mass or height.

Rationality Rules claims to have spent months researching his new position, yet somehow he never stumbled on this? I can sort of understand him missing my post on it; sure, two blog posts of mine currently occupy the top five spots on Google for “Rationality Rules transphobe,” but maybe he didn’t bother doing that search. He can’t have been unaware of Peter/Ethel of EssenceOfThought, though, and they made a related argument:

[9:27] As a guiding tool I’d like to propose a simple principle. If your argument against trans women’s participation in sport can be used to bar cis women, said argument is inherently flawed and should be discarded. So if your argument would exclude cis women athletes such as Margo Dydek who was a 2.18m basketball player, you likely need a new argument.

And yet here is RR himself in the pinned comment to the video:

The “women’s category” is, in my opinion, poorly named given our current climate, and so I’d elect a name more along the lines of the “Under 5 nmol/l category” (as in, under 5 nanomoles of testosterone per litre), but make no mistake about it, the “woman’s category” is not based on gender or identity, or even genitalia or chromosomes… it’s based on hormone levels and the absence of male puberty.

While I didn’t point it out at the time, a graph in this post shows that a non-trivial number of cisgender women have testosterone levels above 5nmol/L. Rationality Rules is not engaging in the best arguments of his critics; instead, he’s continuing his pattern of deceptive editing and cherry-picking to make the arguments of his opponents look weaker than they are. And we can determine all of that without even looking at the video!

I am planning to watch his latest video and critique it, by the way. I’m going to be ridiculously busy for the next few days, and I want to finish off a draft of another post first, so don’t expect it soon. But I wanted to plant this flag because Rationality Rules’ videos have and will continue to do a substantial amount of harm to women. It shouldn’t fall solely on the shoulders of transgender people to oppose that harm.

Rationality Rules is an Oblivious Transphobe

I have some regrets about my last post on Rationality Rules. I banged it out in just a few hours, while I was in the early stages of a nasty cold, and as I result I didn’t lay out all my arguments as clearly as I’d liked. I should have more clearly stated that his behavior was more in line with how a transphobe would react to the situation than someone who wasn’t transphobic. Now that I’ve had the benefit of time and RR’s long-teased follow-up video, I’ve had more time to reflect. As a result, I’ve refined my view of RR.

This new stance might not seem that charitable. After all, we’re talking about a video where RR says:

[1:58] I painted a picture of trans women essentially “stealing” competitions from non-trans women, and you’re absolutely right. I really dropped the ball here, and I will do my utmost best not to make this mistake again. In fact, going forward I’ll be very conscious of my narrative and language altogether, as such a sensitive topic requires nothing less. Truly, I should’ve known better. [2:20]

[9:23] … I absolutely recognise that my honest mistakes caused real harm, and for that I am sincerely sorry. The original video is now delisted and I’ve donated all of the ad revenue that it made to the transgender charity Sparkle. I know that it’ll never make up for the harm that I’ve caused, and that many of you will never consider me an ally again, I understand. [9:47]

He explicitly says a trans woman is a woman, too, at around the 1:40 mark. So why the harsh interpretation? [Read more…]

Rationality Rules is a Transphobe

Ugh, his apology video pissed me off. For instance:

[2:36] Anyhow, yesterday, the ACA, and its productions, published a statement from their board of directors, in which they accused me of making transphobic videos (despite the fact that I’ve only made one video), and they claimed that I’ve published numerous transphobic statements on my social media platforms, though, they neglected to provide any examples, because, to put it bluntly, there isn’t any. Not one. Zero. [3:02]

Fair enough, in the original statement they didn’t give specifics (in their replacement statement, they did). But compare that section with this one:

[1:09] But once I left the ACA’s warm hospitality to fly back to England, their Board of Directors released a public statement denouncing me as “transphobic” and heavily implied that
I’m opposed to the LGBT+ community… which, as anyone who’s watched more than a handful of my videos will tell you, is slanderous hogwash. I’ve defended the LGBT+ community
countless times throughout my short career, and whenever a religion or anyone denigrates them I point it out and crush it where it stands. I mean, I even have a video in which I explain precisely why homosexuality is natural, … [1:42]

So wait, nobody should take the ACA on faith that Rationality Rules made transphobic videos/posts, but Rationality Rules is allowed to assert he’s a strong defender of the LGBT+ community without providing evidence? Go ahead, check the video description, he can’t even be bothered to link to his video on gay people. Gays and lesbians are merely two letters of the four; there’s no conflict between defending them while throwing bisexuals and transgender people under the bus, otherwise there’d be fewer TERFs.

[1:42] … and I frequently emphasise, while many won’t, the fact that a significant amount Muslims harbour harmful anti-LGBT+ views that desperately need to be addressed… [1:50]

Yep, he couldn’t help but go out of his way to toss Muslims under the bus. In reality, US Muslims are as tolerant of gays and lesbians as US Protestants, and more tolerant than White Evangelicals and Black Protestants. They also have the greatest increase of tolerance than any other US group (+25 points between 2011 and 2017), including the “unaffiliated” and atheists (+5 points). I can’t speak for worldwide stats, but there’s more tolerance than you’d think.

[1:50] Now, to be as clear as I possibly can, I know that I made a few big mistakes within my recent video on transgender athletes, and as I’ve stated publically, I am working on a video in which I express my altered views and apologise for what I got wrong, but that’s the thing… I was WRONG on some things… not transphobic. [2:13]

Bigotry is, quite literally, promoting false information. If it were true that black people smoked more weed than white people, it would not be bigoted to say black people are inherently attracted to the ganja, but it’s not true, so it is bigotry. Rationality Rules, by his own admission, spread misinformation about transgender athletes. That makes those statements bigoted. We’d be justified in calling him a bigot if he kept repeating those assertions despite having his bigotry pointed out to him.

[6:57] I’ll shortly be publishing a video in which I acknowledge my mistakes… [7:01]

It’s been over a month, and all he’s done is change the original video title to say it contains “it contains errors,” and changed the blurb to read:

Hey all. I want to make very clear that I made a few major mistakes within this video, and that due to this I’ll be publishing a new video relatively soon in which I correct these mistakes and express my altered views. To be clear, I haven’t done a complete u-turn, but my views have indeed changed in very important ways.

What errors did he make? Wouldn’t it be easy to just pop in a bullet-point summary to prevent the misinformation from spreading? Apparently, Rationality Rules would rather tease us about a future video than stop the spread of misinformation, in which he apparently explains why he hasn’t “done a complete u-turn” about transgender athletes. He’s described his mistakes as both “major” and something “any reasonable person may make.” That does not exactly inspire confidence.

Nor does his divide-and-conquer approach.

[5:04] I feel like I’ve been used, and that I’ve been thrown under a bus immediately after appearing on record-breaking shows for the sake of appeasing a few hypersensitive individuals. Now, with this said, I want to make something crystal clear: I know for a matter of fact that there are many people within the ACA that do not agree with board’s statement, and so please be sure NOT to vent your frustration at the ACA’s public figures, as they’re not responsible for the board’s statement, and they do not necessarily endorse it. [5:31]

All the GOOD people in the ACA agree with him, the BAD people are just “hypersensitive individuals.” How does he know this? He is literally pitting the ACA against itself, fanning the flames of anger even further. And as hinted at earlier, it worked. He continues to use this tactic in a more recent video.

[1:05] Now, a great many of the trans community have reached out to me both privately and publicly to make clear that they were not hurt by my previous video, and that they certainly didn’t find it, or me, to be transphobic. [1:17]

[2:18] I’ve always tried to be an ally to the LGBT+ community, and have always meant for my videos to reflect this, and so to know that my mistakes and hyperbole has likely emboldened some actual bad actors weighs heavily on me… but the weight that I feel is NOTHING compared to those who suffer at the hands of said bad actors. [2:40]

Not only is he pulling out the “my black friend agrees with me” defense, he’s actually saying the transgender people criticizing him do more harm to the transgender community than his misinformation! Like, wow. WOW. He seems to believe that transgender people cannot be wrong about gender identity, when in reality some are no less transphobic than TERFs.

There’s a lot more smoke than fire here, but I think it’s enough to argue Rationality Rules is a transphobe who doesn’t want to admit it. Not convinced? EssenseOfThought made the same argument, from a very different angle and with much better evidence.

Woodford presented cis women as having their dreams, their scholarships, and their careers taken from them by trans women. [quote] The implication here being that the very thing Woodford argues is a terrible crime against cis women, should in effect be forced upon trans women. Because if trans women are excluded from said sports as he argues they should, that’s the result. And he appears to see that as a preferable outcome. For trans women to be excluded from women’s sports, just to ensure cis women don’t lose to them.

Woodford also resorted to fear mongering, using claims such as these to paint trans people as a threat to society. [quote] This sort of statement is designed to create or feed a moral panic. It differs in no way to the claim that gay teachers are a threat to children. It’s designed to emotionally manipulate the listener into believing that there’s some ticking clock of catastrophe, when there’s really none. […]

The prejudice he started with. The dehumanisation he carried out. The threat he cast trans people as. And the way he used this to argue for the removal of their human rights. All of this adds up to show why his video was undoubtedly transphobic.

I found it convincing, and I bet you will too.

Reprobate Spreadsheet

/dev/random, unless I make a hash of it

Likely More Cancer

Wokehammer?

Does capitalism make us unhappy? No, but also yes

Arizona GOP digs an even deeper hole on abortion

Fundraiser Progress

Do news stories make you emotional? Do you feel differently about things that happened when you were younger?

Just Bread

Consequences, Vietnam edition

Rationality Rules DESTROYS Women’s Sport!!1!

What Do We Have Here? ¶

Newton's First Law of Athletics ¶

A Man Without a Plan is Not a Man ¶

I'm a Model, If You Know What I Mean ¶

The Nitty Gritty ¶

Truth and Consequences ¶

Let's Put a Bow On This One ¶

Rationality Rules Is Delusional

“Rationality Rules STILL Doesn’t Understand Sports”

Rationality Rules is a “Lying” Transphobe

Rationality Rules is “A Transphobic Hack”

Rationality Rules is an Irrational Transphobe