A cautionary note about fMRI studies

I’ve been distracted lately — it’s end of the world semester time — and so I didn’t have time to comment on this recent PNAS paper that reports on dramatic sex differences in the brains of men and women. Fortunately, I can just tell you to go read Christian Jarrett, who explains most of the flaws in the study, or you can look at these graphical illustrations of the magnitude of the differences. I just want to add two lesser points.

First, let’s all be really careful about the overselling of fMRI, ‘k? It’s a powerful tool, but it’s got serious spatial and temporal resolution limitations, and it is not, as many in the public seem to think, visualizing directly the electrical signaling of neurons. It’s imaging the broader physiological activity — respiration, oxygen flux, vascular changes — in small chunks of the brain. If you’re ever going to talk about fMRI, I recommend that you read Nick Logothetis’s paper that cooly assesses the state of affairs with fMRI.

The limitations of fMRI are not related to physics or poor engineering, and are unlikely to be resolved by increasing the sophistication and power of the scanners; they are instead due to the circuitry and functional organization of the brain, as well as to inappropriate experimental protocols that ignore this organization. The fMRI signal cannot easily differentiate between function-specific processing and neuromodulation, between bottom-up and top-down signals, and it may potentially confuse excitation and inhibition. The magnitude of the fMRI signal cannot be quantified to reflect accurately differences between brain regions, or between tasks within the same region. The origin of the latter problem is not due to our current inability to estimate accurately cerebral metabolic rate of oxygen (CMRO2) from the BOLD signal, but to the fact that haemodynamic responses are sensitive to the size of the activated population, which may change as the sparsity of neural representations varies spatially and temporally. In cortical regions in which stimulus- or task-related perceptual or cognitive capacities are sparsely represented (for example, instantiated in the activity of a very small number of neurons), volume transmission— which probably underlies the altered states of motivation, attention, learning and memory—may dominate haemodynamic responses and make it impossible to deduce the exact role of the area in the task at hand. Neuromodulation is also likely to affect the ultimate spatiotemporal resolution of the signal.

Just so you don’t think this is a paper ragging on the technique, let me balance that with another quote. It’s a very even-handed paper that discusses fMRI honestly.

This having been said, and despite its shortcomings, fMRI is cur- rently the best tool we have for gaining insights into brain function and formulating interesting and eventually testable hypotheses, even though the plausibility of these hypotheses critically depends on used magnetic resonance technology, experimental protocol, statistical analysis and insightful modelling. Theories on the brain’s functional organization (not just modelling of data) will probably be the best strategy for optimizing all of the above. Hypotheses formulated on the basis of fMRI experiments are unlikely to be analytically tested with fMRI itself in terms of neural mechanisms, and this is unlikely to change any time in the near future.

The other point I want to mention is that there’s a lot of extremely cool data visualization stuff going on in fMRI studies, and also that what you’re really seeing is data that has been grandly massaged. Imagine that I take a photo of my wife’s hand, and my hand. If I just showed you the raw images, the differences would be obvious, and you’d probably have no problem recognizing which was the man’s and which was the woman’s. This is not true of the raw data from two brain scans from a woman and a man — without all kinds of processing and data extraction (legitimate operations, mind you) it would look like a hash of noise. But do we look at two people’s hands, with obvious differences, and announce that we’ve made a dramatic discovery that sex differences are hardwired? So why do scientists get away with it if it involves sticking heads in a very expensive machine that makes funny noises?

Furthermore, the processing done in this distance was designed to abstract and highlight the differences, amplifying their perception. Take the photos of my wife’s hand and mine, and now do some jazzy enhancement to subtract out anything that is the same, so the bulk of the images are erased as unimportant, and then pseudocolor the remainder into neon reds and blues, and display it in 3 dimensions, rotating. That would be a weird, complex image far removed from the mundane familiarity of the shape of the hand, but it would emphasize real differences to an extraordinary degree, while obscuring all of the similarities, and give a false impression of the magnitude of the differences.

Let’s not assign all the differences to something genetic, either (although of course, some are modulated by biological — but not really genetic — differences). If you were to do the same comparison of my hand to my father’s, you’d see much grander differences than between mine and my wife’s. He was a manual laborer and mechanic, and I recall doing the comparison myself: his hands were muscular, powerful, calloused, deeply lined. I should have gotten a photo while he was alive so I could publish it in PNAS, touting significant biological differences between father and son.

(via Stephanie)

Logothetis NK (2008) What we can do and what we cannot do with fMRI. Nature 453(7197):869-78. doi: 10.1038/nature06976.


  1. davidwhitlock says

    This cannot be emphasized enough, the BOLD fMRI signal is purely hemodynamic. It is the acute difference in magnetic susceptibility of a volume depending on the differential perfusion of that volume element with arterial (HbO2 (diamagnetic)) or venous (Hb (paramagnetic)) blood.

    The imputation that what is being measured is some sort of cerebral metabolic rate of oxygen (CMRO2) consumption is not correct. The BOLD signal occurs before action potentials propagate into the volume element.

    Usually there is a very tiny inverse signal (a drop in HbO2 levels) before the BOLD signal indicating an increase in HbO2. I suspect this very tiny signal is due to consumption of O2 by nNOS to generate the NO which causes the vasodilation which results in the BOLD signal.

    There is no separate “nutritive” blood flow regulation. The blood flow measured in the BOLD signal is all the “extra” blood flow that that volume element gets. If it isn’t “enough”, what physiology does is ablate excess metabolic consumption in that volume element.

    What the BOLD signal is, in a physiological sense, is where the NO level is high enough to activate guanylyl cyclase and generate sufficient cGMP to dilate smooth muscle in the vessels and cause vasodilation and increased perfusion by HbO2. BOLD does not measure “activation”, it measures NO levels. There is very good correlation between NO levels and the later-in-time propagation of action potentials into that volume element, but that correlation is not one-to-one.

    The BOLD signal is a relative signal. The magnetic susceptibilty of the whole brain is taken, and averaged and deviations from that average are the “BOLD signal”. Those deviations are small (~1%?). The axons that are differentially firing are still a tiny fraction of the axons in that volume element.

    One of the important differences in males and females is the presence of estrogen and the estrogen receptor. The estrogen receptor activates nitric oxide synthase and makes NO when activated by estrogen. This is likely the pathway by which estrogen has neuroprotective effects. Males and females have different levels of the different estrogen receptors, estrogen and other stuff, and this results in differential NO backgrounds (likely the reason pre-menopausal women don’t get as much heart disease). This will also affect the amount of NO necessary to generate the “BOLD signal”.

    I would expect to see different “BOLD signals” between men and women. So what? We know the “important” stuff about cognition doesn’t relate to blood flow (which is all that BOLD measures).

  2. Max says

    Your last paragraph highlights that the problem is not in finding differences between men and women, it’s in trying to claim those differences exist from only two samples (one from each sex).

    We an adequate sample size and good statistical techniques, we can absolutely draw reasonable conclusions about the general differences between men and women in different traits. That there isn’t 100% non-overlap between sexes is irrelevant.

  3. says

    The question isn’t even whether there are these differences, but where they originate and what they mean. To stay with the image of the hand: nobody would be surprised if we found differences between the hands of violinists and construction workers.
    So why should anybody be surprised to find differences between the brains of men and women?

  4. Enkidum says

    Isn’t the male/female brain paper a DTI paper? In which case it’s not fMRI, although some of the same criticisms would apply.

  5. permanentwiltingpoint says

    Well, when I read about these kind of things I’m always reminded of the study where they made a brain scan on a dead salmon, and found brain activity:

    Bennett, C. M., et al. (2009). Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction


  6. Frank Asshole says

    That cannot be stressed enough. Everything has it’s limitations in accuracy. The biggest problem with fMRI studies are in my opinion the researchers which as humans are prone to biases, sexiness of a new metod of data analysis which they don’t have a good understanding etc. They sometimes read out too much from their data, and go too far in conclusions. Newspeople are not without guilt. Also general public doesn’t know that in a single voxel of fMRI data can be thousands of neurons. They see nice visualisations and headlines “This particular blob contains a memory of your grandma”. Although there is a correlation between single unit recording and functional data (r=0.75 Mukamel 2005, in auditory cortex), fMRI is an approximation and nomen omen rather bold one :). But it is one of the best tools we are currently having. So then, apply skepticism in large buckets, bring researchers to the table. It is science not a village pie contest. Hurt feelings doesn’t matter.
    Speaking of sex differences, the intervariability matters. You are dealing with individual, so conclusions drawn from population have it’s limitations.

  7. Frank Asshole says

    @permanentwiltingpoint: it is my favourite piece of science too. Im in love with that kind of provocations. But it stresses the neccesity of knowing how to perform a data analysis. My philosopher friend always bring this up as an “evidence” of unreliability of fMRI studies. Unfortunately, he knows so little about the method to accept any arguments.

  8. marcoli says

    @#5: That study is extremely weird, but it does make a good point — that being that these studies need to find a way to clear out false + signals in these images. Showing the dead salmon pictures of humans in social situations, and asking it to express how it feels about them was hilarious!

    On a more serious note, the study on humans does not disturb me at all. We should be looking for differences between the brains of men and women (there should be differences whether they are learned or innate), and we should try to be neutral about initial claims when such differences are found. I say let science do its job.

  9. Crip Dyke, Right Reverend Feminist FuckToy of Death & Her Handmaiden says

    Showing the dead salmon pictures of humans in social situations, and asking it to express how it feels about them was hilarious!


    I am so dying right now. I can’t even seriously think about reading the actual study until I can calm my breathing.

    Oh, great gods of science trolling, that is amazing.

  10. marcoli says

    @ Crip Dyke: Click on the link from permanentwiltingpoint, and try to soberly assess the experimental methods with calm dispassion.

  11. ChasCPeterson says

    Isn’t the male/female brain paper a DTI paper? In which case it’s not fMRI

    um, yes, and that kind of makes the entire OP a giant “oops!”, suggesting that its author is opining about a study with which he lacks even cursory familiarity.
    As I understand it, the technique used does not map areas of increased bloodflow/neural activity, but rather maps static structual pathways of white matter, i.e. myelinated axons that connect different gray-matter (synaptic/processing) regions of the brain. That’s a major difference about what’s being measured and claimed. The extensive critical quotes turn out to be completely irelevant.

    some of the same criticisms would apply.

    Which? Quantitative differences in structure are quantitative differences regardless of the data-massaging algorithms applied.
    My strong suspicion would be that “soft” connections, like synapses, would be much more plastic than hard-wired (almost literally) white-matter axonal connections, which wouls severely weaken the it-might-not-be-genetic argument.

    All that said, I’m happy to acknowledge that the differences reported are small and statistical (i.e. with lots of overlap), and probably over-interpreted in terms of functionality. But none of that is surprising.

  12. gillt says

    brains scans definitely aren’t my area of expertise, but I’m fortunate enough to know someone who does brain imaging for a living. While waiting for that response, I found this storify to be pretty insightful.


    As well as some early research:

    Gender differences in brain functional connectivity density

    The organization of intrinsic brain activity differs between genders: a resting-state fMRI study in a large cohort of young healthy subjects.

    Why Sex Matters for Neuroscience

  13. Frank Asshole says

    PZ is definitely right. You have a voxel with data not with changes in local bloodflow but with mapping of water diffusion in cells and environment. DTI itself is just mapping. It looks so cool when rendered in 3D. Of coures it is not without its failings and limitations. Diffusion volumes are more prone to artifacts due to heterogenity of a magnetic field, and physics behind. DTI data can be combined to a probablilistic tractography and we have a degree of certaintity that given tracts exists, combined with functional connectivity from resting state we can approximate direction of a tracts to a certain degree. It requires further methodological and statistical fettling to for example ensure, that given tract is a functionally uniform, and not two tracts combined. Also, DTI only measures white matter, which are myelinated axons. It doesn’t deal with neuron bodies. Problem is also with averaging data. I am dealing currently with primary auditory cortex, which is notorious to work on, because of its small dimensions and intersubject variability is huge. I am pointing this out because there is so much enthusiasm in using these methods, and sometimes we (reasarchers and those who read their work) forget about current constraints of a method.
    Overall, this field is relatively new and probably take some time to clean itself so caution is required.

  14. Enkidum says

    The issues of interpreting function from indirect measures are similar for fMRI and DTI, but the resolution differs between the two. In particular, there is no issue of temporal resolution with DTI (because you’re not looking at things that change in a matter of second, or generally even in a matter of weeks).

    I’m not saying you shouldn’t worry about DTI or that its somehow immune to criticism, but it is a bit of a problem to try to move directly from criticisms of fMRI to criticisms of DTI. (Note that fMRI, like DTI, is a paradigm that uses MRI – fMRI is not the same thing as MRI.)

  15. Frank Asshole says

    Of course. To add, DTI doesn’t fully overlap or should I say doesn’t 100% fill the cytoarchitectonic structure, so generalisations are limited. Also in the process of averageing, we are loosing so much data in terms of interindividual variability.

    I think what GILT posted, that ‘deevybee’ link, contains valid criticism. Renders this paper as piece of not so good but sensationalist science.

  16. Crip Dyke, Right Reverend Feminist FuckToy of Death & Her Handmaiden says

    @permanentwiltingpoint, #5
    From the study linked:

    Subject. One mature Atlantic Salmon (Salmo salar) participated in the fMRI study.
    The salmon was approximately 18 inches long, weighed 3.8 lbs, and was not alive at
    the time of scanning.

    Once again, must cease hyperventilating before I can move on. I did manage to get through 2 other independent paragraphs before losing it over this one, however.

  17. davidwhitlock says

    The PNAS paper is a DTI (diffusion tensor) paper. What is being measured is the anisotropic diffusion of hydrogen. Except is isn’t actually “diffusion” per se, it is movement of hydrogen between when the spins are flipped and when they are measured some time later. That is called “diffusion” because in isotropic liquid media, that is what causes the hydrogen to move, diffusion.

    In a brain, there are other things that make the hydrogen move, there is the convective flow of blood due to the heart pumping, there is water diffusion, there is also cytoplasm entrained by movement of things inside cells, there is also “hypermobile water”, where water is made to diffuse faster by diluting it with non-water to form fluid phases with intermediate hydrophobic/hydrophilic balance. Some of the movement of stuff is powered by ATP (cargo being moved by motors in cells on the cytoskeleton), and also the heat shock proteins causing (and un-causing) movement as in the folding and unfolding of proteins (and other conformation changes).

    It turns out that the movement of hydrogen in the brain is anisotropic. The precise hydrogen that is moving anisotropically is not known. A very important source of movement is the carrying of cargo in axons. Axons are part of a neuron. The nuclear DNA is all in the cell body, in the nucleus. When “stuff” needs to be made, mostly it has to be made near the cell body because that is where all the DNA is. Things like mitochondria pretty much have to be made near the cell body because they need to be (somewhat) near the DNA that makes the several thousand proteins that comprise mitochondria. Mitochondria only have a few genes, so they only make a few proteins, most proteins (99%+) come from nuclear DNA.

    The brain is (mostly) divided into gray matter and white matter. Gray matter is gray because it has a lot of DNA. White matter is white because it doesn’t have much DNA, what it has is a lot of myelin. Myelin is what surrounds axons in a multi-layer structure for various (and unknown) reasons.

    It turns out that there appears to be increased movement in the tracts where there are a lot of axons. People call this “diffusion”, but I suspect a lot of it is actually convection due to entrainment of cytoplasm by the motors that are moving cargo (things like mitochondria) from the cell body out to the tippy end of the axon and back (to be recycled by autophagy).

    This carrying of cargo is going on all the time and is pretty important. It also consumes ~ half the ATP generated in the brain. What does the brain do if there is not enough ATP? It tries to make more, and it also uses less. Since the carrying of cargo takes a long time, it is something that (pretty much always) can be put off for “later”.

    What fMRI measures is blood flow. All neurodegenerative disorders are associated with reduced blood flow. All neurodegenerative disorders are also associated with reduced “diffusion”. That is called “white matter hyperintensities”. What it is is regions of reduced diffusion in the white matter. Why are there regions of reduced “diffusion” in regions of white matter? Good question, the answer to which remains unknown. There is no visible lesion that shows up in the spot where the “white matter hyperintensity” “lesion” is observed on MRI.

    I suspect that the “white matter hyperintensities” are not so much regions of reduced “diffusion”, but regions of reduced convection, convection of cytoplasm entrained along with the cargo being transported by ATP-powered motors in the axons. Why would there be less movement in axons? What if there wasn’t enough ATP? What would be the “logical” thing to shut down first if there wasn’t enough ATP? How about something that takes hours or days? The way that axonal transport takes hours or days (or even weeks). Mitochondria have a speed of ~ 1 micron per second in axons. Some axons are 0.5 meter long, at 1 micron per second, that is about 5 days. If you have an ATP crisis for a few seconds, shutting down something that will take days to occur isn’t a big deal. On the other hand, if the ATP level never goes back up, what is the brain going to do, other than slowly degenerate?

    It turns out that the same thing that regulates blood flow (nitric oxide), is the same thing that regulates the ATP level (nitric oxide through guanylyl cyclase). What a coincidence that neurodegenerative disorders are associated with reduced blood flow and reduced ATP production and reduced hydrogen “diffusion”.

  18. Mark Weber says

    I think it would be intriguing to do a study which compared the brain activity of women with strongly ingrained traditional gender roles, e.g. those from conservative/fundamentalist Christian, Jewish, & Muslim backgrounds, with “liberated” women, e.g. women with careers, especially those in senior management. Would the apparent differences found in this study mirror those in my proposed study? In other words, are the differences somehow intrinsic based on sex or grounded in roles typically assigned by sex/gender? A similar study of stay-at-home dads and career men would be similarly illuminating.

  19. Enkidum says


    Well to be really anal fMRI measures deoxygenated haemoglobin, so it doesn’t even have a direct measure of blood flow.

  20. Enkidum says

    @Mr Asshole – I didn’t actually mean to respond directly to you, that was aimed at PZ #12. But I think we’re pretty much in agreement, anyways.

  21. ChasCPeterson says

    DTI is an MRI technique. Same issues of resolution & interpretation of function from indirect measures.

    Every MRI technique is similarly limited, despite the huge differences in the indirect measures and functions interpreted therefrom? Even simply mapping static structures as opposed to time-dependent fMRI? I don’t know enough to argue the technical details, but I am highly skeptical that that’s accurate. In any case, verbiage about hemodynamics is clearly irrelevant to the study in question.

    Gray matter is gray because it has a lot of DNA.

    maybe, in part. Blood vessels and mitochodria are probably more relevant.

    White matter is white because it doesn’t have much DNA, what it has is a lot of myelin. Myelin is what surrounds axons in a multi-layer structure for various (and unknown) reasons.

    More precisely, myelin consists of multiple layers of phospholipid-bilayer membrane, and its functions–insulating axons from crosstalk and increasing the speed of impulse propagation via saltatory conduction–are well understood afaik.

    It turns out that there appears to be increased movement in the tracts where there are a lot of axons.

    I don’t think the point is increased movement, but rather increased relative directionality of whatever movement is occuring.

    People call this “diffusion”, but I suspect a lot of it is actually convection due to entrainment of cytoplasm by the motors that are moving cargo (things like mitochondria) from the cell body out to the tippy end of the axon and back

    Such convection might contribute to the relative directionality detected (although since the transport occurs in both directions it’s not clear to me why there should be any net convection at all), but more important is the fact that the lipid layers of myelin prevent water/hydrogen from crossing the membrane perpendicularly to the axon.

    This carrying of cargo…consumes ~ half the ATP generated in the brain.

    Reference? I’d predict active ion transport, and maybe protein synthesis, to be more important in terms of total energy use.

  22. Amphiox says

    White matter is white because it is rich in lipids. Grey matter is grey because it doesn’t have as much lipid. Grey is the default color of unpigmented biological material.

    Living brain matter is pink and white, due to blood, rather than grey and white.