Statistics How To

Stratification: Definition

Statistics Definitions >

What is Stratification?


Stratified random sampling is useful when you can subdivide populations. Here, habitat areas (quadrats) can be stratified into either Prairie or Forest. Image: Oregon State

Stratification means to sort data/people/objects into distinct groups or layers. For example, you might sort “All people in the USA” into ethnic groups, income level groups, or geographic groups.

Fields that use this definition for stratification include the social sciences, where people are often sorted into groups by rank, caste, or other social status. For example, in England, you are born into a specific social ranking (e.g. Royal, upper class, middle class). Similarly, “Socioeconomic status” has low income level on the bottom of a hierarchy and upper income level at the top. It is possible in some cases to move up or down the social ladder. However, in a caste system or slavery system, movement is difficult or impossible.

In the Earth Sciences stratification usually refers to a natural process for separating layers rather than a man made one. For example, water is stratified by salinity and temperature. In archeaology, stratification refers to the layers of ground where objects are found. Although the artifacts are man made, the strata are formed by natural processes like sedimentary rock deposits.

For a more detailed explanation on how stratified samples are gathered and used in statistics, see: How to get a stratified random sample.

Stratification in Clinical Trials

In clinical trials, patients are sometimes stratified (allocated) into groups based on any characteristic that may affect the trial outcome. The idea is that you take all of the people with factors that could make the trial unbalanced — like obesity, age, or disease stage — and spread those people randomly among strata.

Randomization on pre-trial data and covariate analysis on post-trial data may negate the usefulness of stratification in large samples (Peto et. al). For small samples*, Therneau states that creating too many strata could have the opposite desired effect; Instead of perfect balance of factors there may be no balance of factors because there will be too few patients in each strata. As an extreme, you could end up with 400 strata with 400 patients — meaning one patient per strata and no diversity at all. To avoid this pitfall, choose no more than 5 important factors with 2-4 levels each to stratify for.

*A small sample in a clinical trial is usually under 400 people.

Peto R, Pike MD, Armitage P, Breslow NE, Cox DR, Howard SV, Mantel N, McPherson K, Peto J, Smith PG: Design and analysis of randomized clinical trials requiring prolonged observation on each patient. 1: Introduction
and design. British Journal of Cancer 34:585-612, 1976

Therneau, T. How many Strati cation Factors is “Too Many” to Use in a Randomization Plan? Retrieved June 5, 2017 from:

University of New Mexico (n.d.). Sociology: Understanding and Changing the Social World. Retrieved June 5, 2017 from:


If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!
Stratification: Definition was last modified: October 12th, 2017 by Stephanie