Bias > Aggregation Bias
What is Aggregation Bias
In ecological studies, aggregation bias is the expected difference between effects for the group and effects for the individual, if there is no confounding. If there is confounding, then the difference for group and individual effects is a combination of confounding and aggregation bias. Aggregation bias leads to the “ecological fallacy” — the conclusion that what is true for the group must be true for the sub-group or individual. It’s called aggregation bias because you’re using aggregated data and extrapolating it inappropriately.
For example, you might have data showing that inner city students tend to perform poorly on standardized tests. That doesn’t mean any one individual will perform poorly. Likewise, you might show that one particular state has a lower than average per-capita income. You can’t say for sure that every county in that state has a lower than average income. And you definitely can’t say that every person in the state has a low income.Aggregation problems can bias results from experiments and surveys. It can also distort the results of Hypothesis Testing and regression analysis.
Luloff and Greenwood (1980) found that increased aggregation causes unpredictable results:
- The Coefficient of Determination, R2, sometimes falls, sometimes increases, and sometimes remains constant.
- Coefficients switch signs and magnitudes In one case, the directional switch retained significance. Statistical significance is lost in some cases.
Example from Research
Perhaps the most famous example of an ecological fallacy is Durkheim’s 1897 study, which inferred that Protestants were more likely to commit suicide, based on data showing that countries with larger Protestant populations had higher suicide rates than counties with larger Catholic populations. The study failed to take confounding variables into account — like the fact that Protestant countries differed in many ways from Catholic countries. Plus Durkeim didn’t look at religious groups within countries when determining suicide rates — he just took data from countries as a whole.
Durkheim, E. (1897). Le suicide. Paris: F. Alcan. English
translation by J A Spalding (1951). Toronto, Canada: Free
Press/Collier-MacMillan.Luloff, A.E., & P. H. Greenwood. 1980. Definitions of Community: An Illustration of
Aggregation Bias. Station Bulletin 516. New Hampshire Agricultural Experiment Station.
Durham, NH: University of New Hampshire.
Vibhanshu Abhishek, Kartik Hosanagar, Peter S. Fader (2015) Aggregation Bias in Sponsored Search Data: The Curse and the
Cure. Marketing Science 34(1):59-77. http://dx.doi.org/10.1287/mksc.2014.0884
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!