Statistics How To

Heterogeneity and Heterogeneous Data in Statistics

Statistics Definitions > Heterogeneity

What is Heterogeneity?

Heterogeneity in statistics means that your populations, samples or results are different. It is the opposite of homogeneity, which means that the population/data/results are the same.

A heterogeneous population or sample is one where every member has a different value for the characteristic you’re interested in. For example, if everyone in your group varied between 4’3″ and 7’6″ tall, they would be heterogeneous for height. In real life, heterogeneous populations are extremely common. For example, patients are typically a very heterogeneous population as they differ with many factors including demographics, diagnostic test results, and medical histories.

Heterogeneity in Clinical Trials and Meta-Analysis

In clinical trials and meta-analysis, heterogeneity of results means that studies have widely varying outcomes. Some studies might show favorable results, while others show unfavorable results. For example, some studies may say that sugar is linked to obesity, while others report that sugar isn’t linked to obesity (outside of it being a source of calories).

Statistical heterogeneity only comes to light after results from studies are analyzed. Ways to figure out if the results are homogeneous or not (i.e. if they all agree or disagree) include:

  • Forest plot: a graph that shows results from several studies side-by-side.
    heterogeneity

    A forest plot showing odds ratios, confidence intervals, and a summary measure. Image: James Grellier | Wikimedia Commons.

  • L’Abbé plot: plots the event rates for control groups and experimental groups against each other.
  • Cochran’s Q: is used to find differences in matched sets of three or more.
  • Chi-square test for homogeneity: tests to see if two populations come from the same unknown distribution (if they do, then they are homogeneous). A low p-value for this test means that the heterogeneity in the data/results is significant.
  • I squared statistic: based on Cohran’s Q, I2 returns the percent variation across studies. The formula is:
    I2 = 100% * (Q – df)/Q,
    Where:

------------------------------------------------------------------------------

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!
Heterogeneity and Heterogeneous Data in Statistics was last modified: November 13th, 2017 by Stephanie Glen

2 thoughts on “Heterogeneity and Heterogeneous Data in Statistics

  1. Arun

    Hi,

    Thanks for providing the information.
    However, I find that below 2 sentences are contradictory. Any clarification will be highly appreciated.

    a) Heterogeneity in statistics means that your populations, samples or results are different.
    b) A heterogeneous population or sample is one where every member has the same characteristic you’re interested in. For example, …