Statistics How To

Heterogeneity and Heterogeneous Data in Statistics

Statistics Definitions > Heterogeneity

What is Heterogeneity?

Heterogeneity in statistics means that your populations, samples or results are different. It is the opposite of homogeneity, which means that the population/data/results are the same.

A heterogeneous population or sample is one where every member has a different value for the characteristic you’re interested in. For example, if everyone in your group varied between 4’3″ and 7’6″ tall, they would be heterogeneous for height. In real life, heterogeneous populations are extremely common. For example, patients are typically a very heterogeneous population as they differ with many factors including demographics, diagnostic test results, and medical histories.

Heterogeneity in Clinical Trials and Meta-Analysis

In clinical trials and meta-analysis, heterogeneity of results means that studies have widely varying outcomes. Some studies might show favorable results, while others show unfavorable results. For example, some studies may say that sugar is linked to obesity, while others report that sugar isn’t linked to obesity (outside of it being a source of calories).

Statistical heterogeneity only comes to light after results from studies are analyzed. Ways to figure out if the results are homogeneous or not (i.e. if they all agree or disagree) include:

  • Forest plot: a graph that shows results from several studies side-by-side.

    A forest plot showing odds ratios, confidence intervals, and a summary measure. Image: James Grellier | Wikimedia Commons.

  • L’Abbé plot: plots the event rates for control groups and experimental groups against each other.
  • Cochran’s Q: is used to find differences in matched sets of three or more.
  • Chi-square test for homogeneity: tests to see if two populations come from the same unknown distribution (if they do, then they are homogeneous). A low p-value for this test means that the heterogeneity in the data/results is significant.
  • I squared statistic: based on Cohran’s Q, I2 returns the percent variation across studies. The formula is:
    I2 = 100% * (Q – df)/Q,


Need help with a specific statistics question? Chegg offers 30 minutes of free tutoring, so you can try them out before committing to a subscription. Click here for more details.

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments are now closed for this post. Need to post a correction? Please post a comment on our Facebook page.
Heterogeneity and Heterogeneous Data in Statistics was last modified: November 13th, 2017 by Stephanie