Statistics Definitions > Causation

## What is Causation?

According to Merriam-Webster, causation is “the act or process of causing something to happen or exist.” In other words, causation means one event is 100 percent certain to cause something else. If you paint, you’ll make a painting. If you stand in the rain, you’ll get wet.

On the other hand, Merriam-Webster states that correlation is “the relationship between things that happen or change together.” Correlation means there’s a relationship, but not a hundred percent. If you paint, you *might *sell a painting. If you stand in the rain, you *might *get hit by lightning.

## Correlation vs. Causation

“…correlation does not imply causation, but it sure as hell provides a hint.”

Slate.com

In real life it’s sometimes hard to pinpoint causation. For example, take the statement “if you commit a felony, you’ll go to jail.” The reality is that you **might** go to jail….if you get caught. And even if you get caught, you might get yourself an excellent attorney who will get you probation and community service. So you **can’t say for sure **that committing a felony will **cause **you to go to jail. But there is a **definite link **in that if you commit a felony you are highly likely to go to jail (a lot more likely than someone who commits a minor crime or who doesn’t commit crimes at all). That link is what is called **correlation; **you can say there is a correlation between committing a felony and going to jail.

## Causation in Statistics

In statistics, correlation can be quantified and given a number where zero is “no correlation” and 1 is “perfect correlation.” Perfect correlation exists and it is pretty much indistinguishable from causation. You’ll rarely (if ever) use the term “causation” and instead you’ll be talking about various types of correlation coefficients and whether your results are statistically significant.

Causation can be extremely hard to prove, as what you’re trying to prove is 100 percent correlation (which rarely happens). Take the case of cigarette smoking. For decades, activists, trade groups, and scientists debated about whether tobacco smoke caused lung cancer and if so, how strong was the link. Many other reasons were suggested for the link between lung cancer and smoking, including sleep deprivation or alcoholism. In layman’s terms, it’s now known that smoking causes lung cancer. But in scientific (or statistical) terms, you can’t really say “cause” as that would mean every single person who smoked even just one cigarette would get lung cancer. As statisticians, we say that there is a very strong correlation between smoking and lung cancer.

For some true, funny, examples of how correlation doesn’t always imply causation (like eating margarine and marriages in Kentucky), check out this guy’s site.

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!
Thank you so much, this really helped me clarify my doubts.