Things that correlate


Did you know that US crude oil imports from Norway correlate almost perfectly with drivers killed in collision with railway train? It’s true! Obviously, Norwegian oil magnates are murdering Americans with trains now.

It’s all from a little site called Things that correlate, which takes any one set of numbers you choose, and then dredges through a database of other numbers to find similar patterns. This is going to be useful next time I have to teach my genetics students a little basic statistics: correlation is not causation, and you can cherry pick data sets to find all kinds of meaningless patterns.

Comments

  1. ilgeo says

    The idea that correlation does not necessarily imply causation has always been tough to teach. This site drills the concept in very effectively.

  2. unclefrogy says

    there is something about that reminds me of technical stock analysis especially the graphs
    uncle frogy

  3. twas brillig (stevem) says

    The idea that correlation does not necessarily imply causation has always been tough to teach.

    Yes, because it does imply causation. The important thing, to teach, is to then try to prove it!
    The problem that the aphorism “Correlation is not causation” is that too often people think a single correlation is absolute proof of causation. While contrarians will point out, “But doesn’t all of science begin with someone seeing a correlation between two events? If correlation did NOT indicate causation, science would never get anywhere.” And the ultimate contrary: “If A causes B, B has to be correlated to A. So how can you say correlation is not due to causation?”
    I agree, it is very tough to teach that implication does *not* equal certainty. That correlation just points at something to look at more closely. It’s also important to teach that unexpected non-correlation is worth looking into. There was a great Scientist who is said to have said, when his experiment didn’t give the results he was expecting, “That’s interesting, let me think about why that happened.”, rather than just saying his assistant botched the experiment, or throwing the results in the trash, etc.

  4. mikeyb says

    Of course we have to be careful because this works both ways. CO2 correlates with mean global temperature increases, but it is also a major cause. Pseudo-skeptics make these sorts of arguments to deny evolution as well as climate change, age of the earth or how inequality is correlated to low tax rates. I think the difference is that when there are independent measurements and approaches to correlate variables, the case for causation becomes much stronger. For example, there are independent radiometric dating methods which correlate the age of the earth with different radioactive elements, which make the case of an old earth nearly certain.

    Correlation does not imply causation, scientific consensus does not equal settled science. But when several independent sets of correlations occur, and the scientific consensus is based on a broad array of scientific facts, then a much stronger case can be made.

    I just worry that this is used just as easily to ‘refute’ good science as it is to support pseudoscience.

  5. Trebuchet says

    Is that a correct English spelling, motzarella?

    Not as far as I know, but it does show up in Google.

    2001 and 2007 were apparently good years for pizza.

  6. Terska says

    The problem with “correlation is not causation” is that now every troll on the internet thinks correlation means nothing at all. Correlation can be a pretty good start.

  7. Terska says

    I chose precipitation in AL. and it randomly chose to correlate it to precipitation in MS. Not so Spurious.

  8. ibyea says

    Number of Nic Cage movies correlate with people drowning in swimming pool by 0.67.

  9. Crip Dyke, Right Reverend Feminist FuckToy of Death & Her Handmaiden says

    No one has linked to this yet?

    Make sure you read the mouseover text.

    @Kagehi:

    Yep: clearly the best ones are the ones where you can almost picture a plausible causation mechanism: consumption of mozzarella cheese with civil engineering doctorates awarded, for instance. Kentucky marriages and deaths by falling out of a boat is another.

    Those are also the best ones for really teaching the scientific method.

  10. numerobis says

    About KY marriages v death by falling out of a boat, someone on my FB commented:

    It makes total sense. Married people need to get away from their spouse and so spend more time on fishing boats, probably drunk.

  11. numerobis says

    I figure it’s the rational argument against gay marriage that they should have tried, rather than letting some judge overturn the discriminatory law and violate the sacred principle of mob rule.

  12. Paul Brown says

    How cool is that site?

    And here’s the thing. Buried in that site’s corpus, there are some (a few?) genuine examples of causation, if only in the ‘C’ causes ‘A’ and ‘B’ variety.

    For example, Number of people who died by becoming tangled in their bedsheets correlates with Total revenue generated by skiing facilities (US). Why? Because each year there are more people! And ( I think it’s fair to say) population increases cause the number of people who suffered death by bed-sheet entanglement and revenue at skiing facilities to go up. Put that in your Bayesian prior and smoke it!

    So yeah – superb teaching tool. Both about the limits and the utility of correlation.

  13. ChasCPeterson says

    This is really fucking stupid. Correlations are properly used to test hypotheses. That is, you need a valid hypothesis of a relationship (not necessarily causation, but a reason for suspecting a relationship) before it’s even worth calculating a correlation coefficient. Then you can calculate the probability associated with an observed correlation coefficient to infer the likliehood of the relationship being meaningful. To just throw random variables together is disdained by statisticians as “fishing expeditions”. It means nothing, proves nothing, and demonstrates nothing. I don;t even see what’s amusing about it.
    “Spurious” indeed. Actually, I’d re-name the site “Fucking Stupid Correlations”.

  14. chigau (違う) says

    Chas
    Did you go to the site?
    The whole point is that it’s arbitrary and silly.

  15. numerobis says

    What is the emoticon to denote the sound of the point flying high overhead?

  16. Amelia Lewis says

    My favorite used to be “consumption of ice cream” and “death by drowning”, which are apparently very strongly correlated.

    (both rise with the temperature, btw … though this may be less true now than when I were a tad)

    More heat == more frozen treats && more jumping into water to splash about and cool off.

    Amy!

  17. says

    This is really fucking stupid.

    Wow man, its like.. all holistic and stuff dude, chill. Like, just a second, my calculator is telling me that the math I just did is, “A Suffusion of Yellow”. Wow, cosmic!

    lol Sorry, just had to. Because, you just know there is someone out there, some place, that might actually “think” some of these things really are connected.

  18. rorschach says

    700 (presumably) Americans die each year by becoming tangled in their bedsheets? How does that work?

  19. Crip Dyke, Right Reverend Feminist FuckToy of Death & Her Handmaiden says

    Okay, I had to come back b/c I had a few minutes and I went back to the site for more and found this gem:

    Points Harvard scored against Yale in The Game
    correlates with
    Deaths caused by confinment to a low-oxygen environment

    So then….some of the correlations *aren’t* spurious?

    That one had me gasping.

  20. unclefrogy says

    I have heard it argued before that if there things that are easy to find out and have a strong correlation with something else that is hard to predict you can make money by betting on what the hard thing will be in the future.. that seems to be how many stock market indicators are used as a way to predict price movement that is along with the secret formulas (proprietary methods) the story of how they beat the market of course sometimes they just use phoney numbers to fake it or inside information to cheat.
    The story is always better analysis of correlations.
    as has been said it is ridicules and silly but some times it works and it always sells. it is a real thing.
    uncle frogy

  21. knowknot says

    What I learned today:
    Bicyclists tend to run into things and die because they are easily distracted by clumsy women dying.
    (Except in 2007, when bicyclists were, for some reason, less interested in clumsy dying women, even though clumsy women apparently didn’t like being ignored and put in some extra effort into dying.)
    Project for tomorrow:
    Finding ways to incite gratuitous concern in gullible persons.
    Or, if I can find some kind of correlation between degrees awarded in fields related to optics and admissions to inpatient mental health facilities, create a whole new field of research (and accompanying thread) for medic0506.