Oversampling in Statistics
In statistics, oversampling involves taking higher, disproportionate samples than would otherwise be collected with random sampling. Depending on the structure of the survey or poll, oversampling might result in bringing low numbers of an underrepresented minority group to a representative proportion. In some cases, it might result in the minorities being over-represented. However, these “disproportionate” samples are aimed at effectively studying small groups, not biasing survey or poll results (Merger 2016).
Benefits of Oversampling
- Level the playing field and make a survey or sampling design more representative of a population,
- Reduce bias,
- Arguably, improve validity; Not everyone agrees that oversampling results in improved validity (Couper, as cited in Lazar et. al).
Having a large response from a certain segment of the population reduces the likelihood of that population being underrepresented. For example:
- A 1992 telephone survey (Mohadjer & West) doubled the samples in geographic areas with large proportions of black and Hispanic residents to improve reliability estimates for these populations.
- In a survey on sexual assault, Busch et. al (2003) oversampled African Americans and Hispanics to match Texas’s overall demographics. Sexual violence against black women routinely goes unreported (Now.org, 2019).
Busch, N. et. al (2003). The Health Survey of Texans; A Focus on Sexual Assault.
Lazar, J. et al. (2017). Research Methods in Human-Computer Interaction 2nd Edition. Morgan Kaufmann.
Merger, A. (2016). Oversampling is used to study small groups, not bias poll results.
Mohadjer, L. & West, J. (1992). Effectiveness of Oversampling Blacks and Hispanics in the NHES Field Test: National Household Education Survey. U.S. Department of Education, Office of Educational Research and Improvement, National Center for Education Statistics.
Now.Org. Black Women and Sexual Violence.