To a first approximation, science is about differences between groups: if death is a major motivator for human behavior, then we should expect people who have been reminded of their mortality to act differently than people who were not. Sometimes, a group is only conceptual: if the Higgs boson exists, then when slamming two protons together there should be more photons observed in the aftermath, compared to the amount we predicted. Sometimes, we generate groups after the fact: if we plot star colour and luminosity on a Hertzsprung-Russell diagram, stars naturally settle into four major groups. Sometimes, the “difference” we care about is that there is no difference at all: if a cosmetic is safe to use, then if we compare a group of people who use it to people who don’t, we should observe no difference in health. These divisions are so common that we often neglect to clearly delineate our groups: “does a daily dose of Aspirin prevent strokes?” implies that people who take Aspirin are less likely to get a stroke than people who do not.
At some point these groups must be clearly delineated, however; when they are not, a common problem in epidemiology, we lose our ability to find differences between them. Worse, fuzzy groups allow us to manufacture differences that don’t exist, say by classifying legitimate data as illegitimate “outliers” to get the results we want. This “differences between groups” metaphor is surprisingly powerful, to the point that’s a good solution to the demarcation problem. A core claim of astrology is that people differ based on the day they were born; if we divide people into those groups, yet fail to find differences, then astrology cannot be true. The Myers-Briggs personality test claims that we can divide people into specific groups, yet studies that use difference to reconstruct groups have failed to see those groups materialize.
By convention, if we are testing whether or not some change leads to a difference, we call the group we don’t change the “control” group. This group is often conceptual, thanks to frequentist statistical techniques, but that only works if the tools we use to find difference are perfectly calibrated; if they are not, the data might be biased and you’d never know. As a result, lacking a control group is considered a reason to suspect your results.
I apologize if all that was painfully obvious; I grasped these concepts way back in Junior High, well before I was legally allowed to drive. Still, I needed to type it out to convey the pain of what comes next.