Tuesday, December 13, 2011

The Statistical Anomaly

“There are lies, damned lies, and statistics.” How is that for starting off with a cliché one-liner!? However, it is true that statistics can be quite misleading or difficult to interpret at times. Enter Simpson's Paradox. I'm going to demonstrate two interesting real-world examples of this.

ApplicantsAdmitted
Men844244%
Women432135%

The above numbers are grad school admissions at UC Berkeley from the fall of 1973. It sure seems that, compared to women, men were more likely to be admitted. Looking at these figures, would you accuse them of gender bias? Well, some people did, and sued the university!

So Berkeley decided to take a closer look at the numbers. Admissions are per department, so they wanted to find out which specific departments were guilty of a significant bias against women. Guess what... none of them were.


Here's a snippet of the admission rate of the top six departments so you can see for yourself:

DepartmentMenWomen
ApplicantsAdmittedApplicantsAdmitted
A82562%10882%
B56063%2568%
C32537%59334%
D41733%37535%
E19128%39324%
F2726%3417%

Look at that! If anything, what we see here is a “small but statistically significant bias in favor of women”. That's a quote from a research paper.
It turns out that women applied more to competitive departments that had lower admission rates in general. It skews the numbers when presented in different ways. See how easy it is to be misdirected?

Another real-life example then. This is from a medical study, and it compares two treatments for kidney stones. It shows the success rate of the treatment in general, and also when making a distinction between small and large stones.
So tell me, which treatment is more effective?

Small stonesLarge stonesBoth (total)
Treatment A81 / 8793%192 / 26373%273 / 35078%
Treatment B234 / 27087%55 / 8069%289 / 35083%

For small stones, the table says treatment A is best. For large stones, it says the same. But without making this distinction in stone size, it says treatment B is best. Huh?

For each stone size, the group size of the two treatments being compared are very different (87 versus 270 for treatment A, 263 versus 80 for treatment B). This causes problems when combining the data. See, doctors tend to give the traditional treatment A to the more severe cases with large stones, while milder cases with small stones are more often treated with the less-invasive treatment B. Now you've got two larger groups that erroneously dominate the totals over two smaller groups. Since cases with small stones have a better success rate in general, it makes treatment B look better than it actually is.

With group sizes such as this you shouldn't combine (‘aggregate’) the data. Therefore the proper conclusion is that treatment A is best!

No comments:

Post a Comment