Statistics Definitions > Effect Size / Measurement of Association
Before reading this article, you may want to review: What is a p value?.
The terms “Measure of Association” and “Effect Size” both mean the same thing: quantifying the relationship between two groups. It’s more common to talk about Effect Size in the medical field, when you want to know how exposure is related to disease (i.e. What effect does exposure have on disease outcome?). On the other hand, Measure of Association is used in an informal way to mean the same thing (quantifying relationships between groups) in most other fields. Measure of association could also refer to specific tests for relationships, like:
- Chi square test of independence,
- Odds ratio,
- Proportionate mortality ratio
- Rate ratio,
- Risk Ratio (relative risk).
Effect Size: Overview
The effect size is how large an effect of something is. For example, medication A is better than medication B at treating depression. But how much better is it? A traditional hypothesis test will not give you that answer. Medication B could be ten times better, or it could be slightly better. This variability (twice as much? ten times as much?) is what is called an effect size.
Most statistical research includes a p value; it can tell you which treatment, process or other investigation is statistically more sound than the alternative. But while a p value can be a strong indicator of which choice is more effective, it tells you practically nothing else.
Statistical significance is the least interesting thing about the results. You should describe the results in terms of measures of magnitude –not just, does a treatment affect people, but how much does it affect them. ~Gene V. Glass
Effect size can tell you:
- How large the difference is between groups.
- The absolute effect (the difference between the average outcomes of two groups).
- What the standardized effect size is for an outcome.
An example of absolute effect could be: patients taking drug B for depression might see a mean improvement on a depression test (like Beck Depression Inventory) of 25 points. Standardized effect sizes are similar to the way some scores are standardized using z-scores; they give a perceived effect some numerical value that is easily understood. For example, the categories on a Likert scale (agree, strongly agree, disagree etc..) have more meaning when they are standardized.
Why Use Effect Size?
Effect sizes are important to allow the layperson (i.e. someone who isn’t a statistician) to be able to understand the results of your analysis. One widely-reported study that did not include effect size was this one on PubMed; the low p value showed that aspirin could help to prevent myocardial infarction (heart attacks). Physicians started to recommend aspirin as a general prevention for heart attacks. However, although aspirin did indeed show potential for heart attack prevention, the effect size was tiny: a mere difference in risk of 0.77%. Further studies showed that the effect size was even smaller for the general population and recommendations for aspirin use have since been modified.
Common Measures for Effect Size
Three common measures in ANOVA are:
Other measures include:
Kline RB. Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research. Washington DC: American Psychological Association; 2004. p. 95.