Outliers >
What is Pierce’s Criterion?
Pierce’s Criterion is a method, proposed by Benjamin Pierce in 1852, that allows us to eliminate outliers from data sets.
An outlier is a part of the data set that is abnormal and not representative of the general trend. Finding and eliminating outliers is an important part of much scientific research based on real-life measurements, and Pierce’s Criterion is one of the most theoretically rigorous ways to do this.
It’s also more generally applicable than other methods of eliminating outliers, including the more commonly used Chauvenet’s Criterion. It doesn’t make any arbitrary assumptions on the rejection of data. And, to make it even more useful, it can be used to eliminate more than one outlier.
How to use Pierce’s Criterion
Using Pierce’s Criterion in its original form is cumbersome, but it has been condensed into a set of tables and an easy to use formula which can be easily applied to data sets.
R in these tables is a ratio: the ratio of the maximum allowable deviation from the sample set’s mean to the standard deviation. That is,
Knowing R, then, we can calculate the value of |X_{i} – X_{m}| _{max} from the standard deviation.
To use this modern form of Pierce’s Criterion, one would
- Find the mean and sample standard deviation for the entire set.
- Look up the value of R in a Pierce’s table (like this one) that corresponds to the number of observations in your sample set. Begin by assuming one outlier, although you may repeat the process to discover more than one.
- Use the formula |X_{i} – X_{m}| _{max}= σ R to calculate the maximum allowable deviation.
- Calculate the actual deviation of your potential outliers. |X_{i} – X_{m}|
- Check if |X_{i} – X_{m}| > |X_{i} – X_{m}|_{max}, and if it is, eliminate that outlier.
- Now assume two outliers, and go through step 2-5 again. Keep the original number of measurements as well as the original values of the standard deviation and mean.
- If your calculations in step 6 give you another outlier, you can repeat the process. Assume an additional outlier each time through and use the original number of measurements, mean and standard deviation each time.
- Once all questionable data has been tested, calculate the mean and standard deviation again for your final data set.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.
Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!