Monday, May 19, 2014

Correlation versus Causation

What's the difference between correlation and causation when it comes to statistics? If you're relying on statistics to prove your point in court, it's a question you must understand. According to
an action or occurrence can cause another (such as smoking causes lung cancer), or it can correlate with another (such as smoking is correlated with alcoholism). If one action causes another, then they are most certainly correlated. But just because two things occur together does not mean that one caused the other, even if it seems to make sense.

Unfortunately, our intuition can lead us astray when it comes to distinguishing between causality and correlation. For example, eating breakfast has long been correlated with success in school for elementary school children. It would be easy to conclude that eating breakfast causes students to be better learners. It turns out, however, that those who don’t eat breakfast are also more likely to be absent or tardy — and it is absenteeism that is playing a significant role in their poor performance. When researchers retested the breakfast theory, they found that, independent of other factors, breakfast only helps undernourished children perform better.
To illustrate the pitfalls of conflating correlation and causation, check out spurious correlations, which reports that the number of people who drown annually by falling into a swimming pool correlates fairly closely with the number of films Nicolas Cage appeared in annually. The less-informed reader may conclude the more films Mr. Cage is in, the greater the likelihood of swimming pool drowning deaths.

The more informed reader would simply create his or her own spurious correlations and have a laugh. For example, did you know that the Ohio's sunshine has a direct effect on the number of lawyers Nebraska will have in any given year?