When I started out as a graduate student, I was a teaching assistant in a lab. Invariably in physics labs students are expected to measure some quantity multiple times and then take the average so as to minimize the effect of random uncertainties that are intrinsic to any measurement. I recall some students showing me a set of about six numbers and the average that they had calculated from it. They were amazed when I told them after a quick glance that the average was wrong.
It was not that I was some sort of human computer able to do complicated calculations in my head but because the average they had calculated was outside the range spanned by the individual measurements, an obvious impossibility. What this illustrates is that I had an intuitive idea of what an average is and roughly what value it should take while they had not as yet developed that ability.
There are other intuitive features about averages that most of us have. One is that the average of two numbers must lie exactly in the middle. But this intuition can led us astray sometimes, especially when we take the average of averages. For example, if we travel some distance at an average speed of 40 mph and then continue for the same distance at 20 mph, then what is the average speed for the full trip? The answer is not 30 mph (the average of the two speeds) but 26⅔ mph. The reason is that you cannot take the average of two (or more) averages unless the denominators are the same. In this case time is the denominator for calculating speeds and it is different for the two segments of the journey.
On the other hand, if we traveled at an average speed of 40 mph for 2 hours and then at 20 mph for another 2 hours, the average speed for the full trip would indeed be the average of the two speeds, 30 mph.
I was reminded of another form of averaging to be wary of thanks to a comment on my post about circumcision and autism. This kind of thing can happen because of the way that statistics about diseases are presented in health-related papers where frequencies are expressed as 1 in some number X. This way of expressing it is simple and informative but can cause problems when we try to average two frequencies.
Suppose that the incidence of a disease among boys is 1 in 50 and among girls is 1 in 250, i.e., this disease is found five times more frequently in boys than in girls. What is the frequency of the disease in the population of children, irrespective of gender? One might intuitively think that since there are roughly equal numbers of boys and girls, the frequency would be close to 1 in 150 (exactly in the middle of 50 and 250), and exactly so if the numbers of boys and girls are equal. But that would be wrong because the frequency refers to the number for a certain baseline of the population and in the way frequencies are usually expressed, they are with respect to different baselines (50 and 250 in this example) and thus cannot be averaged.
To get the actual gender-independent frequency, we need to realize that a 1 in 50 frequency means that 20 boys out of 1000 will have the disease. Similarly, 1 in 250 means that 4 girls out of 1000 will have the disease. Since we now have expressed them in terms of the same baseline of 1000, we can average the two numbers 4 and 20 and say that the frequency will be 12 out of 1000 or 1 in 83⅓.
Another way to see it is that if we consider 1000 children, 500 will be boys of whom 10 will have the disease while another 500 will be girls of whom 2 will have the disease. Hence out of 1000 children, 12 will have the disease and that works out to 1 in 83⅓.
This is one of those things that are trivial once you see it but can trip up the unwary.