Heterogeneity and Heterogeneous Data in Statistics

What is Heterogeneity?

Heterogeneity in statistics means that your populations, samples or results are different. It is the opposite of homogeneity, which means that the population/data/results are the same.

A heterogeneous population or sample is one where every member has a different value for the characteristic you’re interested in. For example, if everyone in your group varied between 4’3″ and 7’6″ tall, they would be heterogeneous for height. In real life, heterogeneous populations are extremely common. For example, patients are typically a very heterogeneous population as they differ with many factors including demographics, diagnostic test results, and medical histories.

Heterogeneity in Clinical Trials and Meta-Analysis

In clinical trials and meta-analysis, heterogeneity of results means that studies have widely varying outcomes. Some studies might show favorable results, while others show unfavorable results. For example, some studies may say that sugar is linked to obesity, while others report that sugar isn’t linked to obesity (outside of it being a source of calories).

Statistical heterogeneity only comes to light after results from studies are analyzed. Ways to figure out if the results are homogeneous or not (i.e. if they all agree or disagree) include:

Forest plot: a graph that shows results from several studies side-by-side.
A forest plot showing odds ratios, confidence intervals, and a summary measure. Image: James Grellier | Wikimedia Commons.
L’Abbé plot: plots the event rates for control groups and experimental groups against each other.
Cochran’s Q: is used to find differences in matched sets of three or more.
Chi-square test for homogeneity: tests to see if two populations come from the same unknown distribution (if they do, then they are homogeneous). A low p-value for this test means that the heterogeneity in the data/results is significant.
I squared statistic: based on Cohran’s Q, I² returns the percent variation across studies. The formula is:
I² = 100% * (Q – df)/Q,
Where:
- Q = Cochran’s Q and
- df = degrees of freedom.

References

Wallis, W. (2014). The Nature of Statistics (Illustrated). Dover Publications.