Post Hoc Definition and Types of Tests

Post hoc test results. Image: DOT.
Post hoc test results. Image:

Post Hoc Tests

Post hoc (Latin, meaning “after this”) means to analyze the results of your experimental data. They are often based on a familywise error rate; the probability of at least one Type I error in a set (family) of comparisons.

The most common post hoc tests are:

Bonferroni Procedure (Bonferonni Correction)
This multiple-comparison post hoc correction is used when you are performing many independent or dependent statistical tests at the same time. The problem with running many simultaneous tests is that the probability of a significant result increases with each test run. This post hoc test sets the significance cut off at α/n. For example, if you are running 20 simultaneous tests at α = 0.05, the correction would be 0.0025. More detail. The Bonferroni does suffer from a loss of power. This is due to several reasons, including the fact that Type II error rates are high for each test. In other words, it overcorrects for Type I errors.

Holm-Bonferroni Method
The ordinary Bonferroni method is sometimes viewed as too conservative. Holm’s sequential Bonferroni post hoc test is a less strict correction for multiple comparisons. See: Holm-Bonferroni method for a step-by-step example.

Duncan’s new multiple range test (MRT)
When you run Analysis of Variance (ANOVA), the results will tell you if there is a difference in means. However, it won’t pinpoint the pairs of means that are different. Duncan’s Multiple Range Test will identify the pairs of means (from at least three) that differ. The MRT is similar to the LSD, but instead of a t-value, a Q Value is used.

Fisher’s Least Significant Difference (LSD)
A tool to identify which pairs of means are statistically different. Essentially the same as Duncan’s MRT, but with t-values instead of Q values. See: Fisher’s Least Significant Difference.

Like Tukey’s, this post hoc test identifies sample means that are different from each other. Newman-Keuls uses different critical values for comparing pairs of means. Therefore, it is more likely to find significant differences.

Rodger’s Method
Considered by some to be the most powerful post hoc test for detecting differences among groups. This test protects against loss of statistical power as the degrees of freedom increase.

Scheffé’s Method
Used when you want to look at post hoc comparisons in general (as opposed to just pairwise comparisons). Scheffe’s controls for the overall confidence level. It is customarily used with unequal sample sizes.
See: The Scheffe Test.

Tukey’s Test
The purpose of Tukey’s test is to figure out which groups in your sample differ. It uses the “Honest Significant Difference,” a number that represents the distance between groups, to compare every mean with every other mean.

Dunnett’s correction
Like Tukey’s this post hoc test is used to compare means. Unlike Tukey’s, it compares every mean to a control mean. For calculation steps, see: Dunnett’s Test.

Benjamini-Hochberg (BH) procedure
If you perform a very large amount of tests, one or more of the tests will have a significant result purely by chance alone. This post hoc test accounts for that false discovery rate. For more details, including how to run the procedure, see: Benjamini-Hochberg Procedure.

More on the Bonferroni Correction

bonferroni correction
The Bonferroni correction (sometimes called the Bonferroni procedure) accounts for multiple tests.

The Bonferroni correction is used to limit the possibility of getting a statistically significant result when testing multiple hypotheses. It’s needed because the more tests you run, the more likely you are to get a significant result. The correction lowers the area where you can reject the null hypothesis. In other words, it makes your p-value smaller.

Imagine looking for the Ace of Clubs in a deck of cards: if you pull one card from the deck, the odds are pretty low (1/52) that you’ll get the Ace of Clubs. Try again (and try perhaps 50 times), you’ll probably end up getting the Ace. The same principal works with hypothesis testing: the more simultaneous tests you run, the more likely you’ll get a “significant” result. Let’s say you were running 50 tests simultaneously with an alpha level of 0.05. The probability of observing at least one significant event due to chance alone is:
P (significant event) = 1 – P(no significant event)
= 1 – (1-0.05)50 = 0.92.
That’s almost certain (92%) that you’ll get at least one significant result.

How to Calculate the Bonferroni Correction

The calculation for this post-hoc test is actually very simple, it’s just the alpha level (α) divided by the number of tests you’re running.
Sample question: A researcher is testing 25 different hypotheses at the same time, using a critical value of 0.05. What is the Bonferroni correction?
Bonferroni correction is α/n = .05/25 = .002

For this set of 25 tests, you would reject the null only if your p-value was smaller than .002.

The Bonferroni Correction and Medical Testing

Matthew A. Napierala, MD points out how multiple tests affect physicians (and patients) in an article for the American Academy of Orthopaedic Surgeons (AAOS). “In contemporary orthopaedic research studies, numerous simultaneous tests are routinely performed.” This means that given enough tests, one of them is bound to come back as a false positive. Definitely not a good thing when we’re talking about health issues.

Post Hoc Test: References

AAOS. Research News. Retrieved January 1, 2020 from:
Levine, D. (2014). Even You Can Learn Statistics and Analytics: An Easy to Understand Guide to Statistics and Analytics 3rd Edition. Pearson FT Press
Cook, T. (2005). Introduction to Statistical Methods for Clinical Trials (Chapman & Hall/CRC Texts in Statistical Science) 1st Edition. Chapman and Hall/CRC
Wheelan, C. (2014). Naked Statistics. W. W. Norton & Company
Need help with a homework question? Check out our tutoring page!

Comments? Need to post a correction? Please Contact Us.