Correlation and Causality

Correlation and Causality - what are they?

Correlation and causality are two concepts in statistics that are widely confused and misunderstood by the public at large. Correlation in statistics simply means that two events or variables are related; it's essentially a statement that A and B happen together or one after the other. Many uninformed people jump to an assumption that because they're correlated, A causes B. It's possible (but by no means certain) that A does cause B; if so, that would be an example of causality.

Causality is one possible type of relationship between two correlated events, but there are others. When A and B are correlated, there are four possible relationships:
A causes B--for example, babies born before 36 weeks gestation are smaller than those born at 40 weeks. The low birth weight is a direct result of not having as much time to grow.
B causes A--the faster a wind turbine rotates, the higher the wind speed. The wind speed causes the turbine to rotate.
A and B are both caused by a third unidentified factor, but do not cause each other--see the HRT study described below.
A and B are unrelated and the correlation is merely a coincidence--a golfer was wearing a red shirt the day he hit his first hole-in-one.

When one study shows a correlation, further studies are needed to replicate and substantiate the findings before causality can be determined. The type of study and the methods uses are important, too. Mere anecdotal cases do not constitute evidence.

Observational studies are just that and compound errors. Researchers observe events over time. In medicine, observational studies have shown tendencies in populations, such as that diabetes is more common in certain ethnic groups. Experimental studies compare groups with variables controlled. Randomized, controlled drug trials, comparing a new drug with a placebo, are the "gold standard" of experimental medicine. The study group and the control group are matched as closely as possible for age, sex, race, socioeconomic factors and disease stage; one group is given the drug being evaluated, and the other group is given a placebo. If the study group shows a better treatment result than the control group, that is good evidence that the tested drug caused the improvement.

Let's look at an example. It is now the general consensus of the medical community that cigarette smoking causes or exacerbates many health problems such as lung cancer, other lung diseases such as emphysema, and heart disease. The earliest observational studies, in the 1940s, showed that significantly greater percentages of cigarette smokers eventually developed lung cancer. The tobacco companies, rightly, asserted that the correlation, alone, was not enough evidence. However, further studies demonstrated a dose-effect relationship: the more a person smoked, the greater their chance of developing illnesses; stopping smoking reduced the risk; and cigarettes and cigarette smoke were shown to contain substances that, in other unrelated studies, had been demonstrated to cause cancer. Taken together, the multiple correlations do add up to enough evidence to infer causality. And, the tobacco companies have an obvious bias--they don't want people to believe that their products are harmful.

In another medical example, a 2004 epidemiological study showed that post-menopausal women who took hormone replacement therapy (HRT) had a lower incidence of coronary heart disease (CHD). But randomized controlled drug trials showed a higher incidence of CHD in women who received HRT. Clearly, more research was necessary. Re-examination of the epidemiological data showed that the women who received HRT also tended to have more education and higher socioeconomic status than those who did not, and better diet and exercise regimens. The use of HRT and the lower rate of CHD were both caused by a common, other factor--the women's life circumstances--but were not causally related to each other.

Whenever a study purports to show causality, it's important to ask questions before simply accepting the conclusion as fact. Was the study observational or experimental? Was it even a formal study, or just anecdotal? What have other studies shown? Is there any bias in the findings? Correlation CAN indicate causality, but by itself is not enough.