# How to Find Outliers in Data: Easy Steps and Video

Probability and Statistics > Basic Statistics > How to find outliers

Outliers are stragglers — extremely high or extremely low values — in a data set that can throw off your stats. For example, if you were measuring children’s nose length, your average value might be thrown off if Pinocchio was in the class.

## What is an outlier?

An outlier is a piece of data that is an abnormal distance from other points. In other words, it’s data that lies outside the other values in the set. If you had Pinocchio in a class of children, the length of his nose compared to the other children would be an outlier.
In this set of random numbers, 1 and 201 are outliers:
1, 99, 100, 101, 103, 109, 110, 201
“1” is an extremely low value and “201” is an extremely high value.

Outliers aren’t always that obvious. Let’s say you received the following paychecks last month:
\$225, \$250, \$25, \$235.
Your average paycheck is \$135. But that small paycheck (\$25) might be because you went on vacation, so a weekly paycheck average of \$135 isn’t a true reflection of how much you earned. Yoru average is actually closer to \$237 if you take the outlier (\$25) out of the set.

Of course, trying to find outliers isn’t always that simple. Your data set may look like this:
61, 10, 32, 19, 22, 29, 36, 14, 49, 3.
You could take a guess that 3 might be an outlier and perhaps 61. But you’d be wrong: 61 is the only outlier in this data set.
A box and whiskers chart (boxplot) often shows outliers:

The outlier on this boxplot is outside of the box and whiskers.

However, you may not have access to a box and whiskers chart. And even if you do, some boxplots may not show outliers. For example, this chart has whiskers that reach out to include outliers:

Therefore, don’t rely on finding outliers from a box and whiskers chart. That said, box and whiskers charts can be a useful tool to display them after you have calculated what your outliers actually are. The most effective way to find all of your outliers is by using the interquartile range (IQR). The IQR contains the middle bulk of your data, so outliers can be easily found once you know the IQR.

## How to Find Outliers Using the Interquartile Range(IQR)

An outlier is defined as being any point of data that lies over 1.5 IQRs below the first quartile (Q1) or above the third quartile (Q3)in a data set.
High = (Q3) + 1.5 IQR
Low = (Q1) – 1.5 IQR

Watch this video on How To Find Outliers, or read the steps below:

Sample Question: Find the outliers for the following data set: 3, 10, 14, 22, 19, 29, 70, 49, 36, 32.

Step 1: Find the IQR, Q1(25th percentile) and Q3(75th percentile). Use our online interquartile range calculator to find the IQR or if you want to calculate it by hand, follow the steps in this article: Interquartile Range in Statistics: How to find it.
IQR = 22
Q1 = 14
Q3 = 36

IQR, Q1 and Q3 found using the online calculator (see link in this step).

Step 2: Multiply the IQR you found in Step 1 by 1.5:
IQR * 1.5 = 22 * 1.5 = 33.

Step 3: Add the amount you found in Step 2 to Q3 from Step 1:
33 + 36 = 69.

This is your upper limit. Set this number aside for a moment.

Step 3: Subtract the amount you found in Step 2 from Q1 from Step 1:
14 – 33 = -19.
This is your lower limit. Set this number aside for a moment.

Step 5: Put the numbers from your data set in order:
3, 10, 14, 19, 22, 29, 32, 36, 49, 70

Step 6: Insert your low and high values into your data set, in order:
-19, 3, 10, 14, 19, 22, 29, 32, 36, 49, 69, 70

Step 6: Highlight any number below or above the numbers you inserted in Step 6:
-19, 3, 10, 14, 19, 22, 29, 32, 36, 49, 69, 70

That’s it!

## How to Find Outliers with the The Tukey Method

Frequency chart with boxplot at the top. The outliers are shown as dots outside the range of the whiskers.

The Tukey method for finding outliers uses the interquartile range to filter out very large or very small numbers. It’s practically the same as the procedure above, but you might see the formulas written slightly differently and the terminology is a little different as well. For example, the Tukey method uses the concept of “fences”.

The formulas are:
Low outliers = Q1 – 1.5(Q3 – Q1) = Q1 – 1.5(IQR)
High outliers = Q3 + 1.5(Q3 – Q1) = Q3 + 1.5(IQR)
Where:
Q1 = first quartile
Q3 = third quartile
IQR = Interquartile range

These equations give you two values, or “fences“. You can think of them as a fence that cordons off the outliers from all of the values that are contained in the bulk of the data.

Sample question: Use Tukey’s method to find outliers for the following set of data: 1,2,5,6,7,9,12,15,18,19,38.
Step 1: Find the Interquartile range:

1. Find the median: 1,2,5,6,7,9,12,15,18,19,38.
2. Place parentheses around the numbers above and below the median — it makes Q1 and Q3 easier to find.
(1,2,5,6,7),9,(12,15,18,19,38)
3. Find Q1 and Q3. Q1 can be thought of as a median in the lower half of the data. Q3 can be thought of as a median for the upper half of data.
(1,2,5,6,7), 9, ( 12,15,18,19,38). Q1=5 and Q3=18.
4. Subtract Q1 from Q3. 18-5=13.

Step 2: Calculate 1.5 * IQR:
1.5 * IQR = 1.5 * 13 = 19.5

Step 3: Subtract from Q1 to get your lower fence:
5 – 19.5 = -14.5

18 + 19.5 = 37.5.

(-14.5) 1,2,5,6,7,9,12,15,18,19,(37.5),38.
Anything outside of the fences is an outlier. For this data set, 38 is the only outlier.

That’s how to find outliers with the Tukey method!

## How to Find Outliers with Advanced Methods

Reference: John Tukey, Exploratory Data Analysis, Addison-Wesley, 1977, pp. 43-44.

Check out our YouTube channel for more stats tips and help!

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

How to Find Outliers in Data: Easy Steps and Video was last modified: October 15th, 2017 by

# 5 thoughts on “How to Find Outliers in Data: Easy Steps and Video”

1. Abi

I surprised by the way you teach me thanks very much;but i have only 4 problems on probability please could you help me?

2. Abi

Q1. Let X_1 X_(2 )…,X_(n )be a random sample of size n from a population with probability density function given by
f(x,θ)={2(θ-x}/θ^2 ;0 <x<θ 0;elswhere

A, find an estimator θ ̂ of θ by the method of moments
B, For n=98 andθ=6, provide an approximate P (1.72<X ̅0 0;elswhere
A, apply the Neyman-pearson Lemma to obtain a method for testing hypothesis H0:δ=1 against the alternative H1:δ=1
B, Obtain the likelihood ratio for the hypothesis H0: δ=1 against the alternative H1: δ≠1
C, In a sample size n=250 it was found that the sample mean was 0.9. use this data to test the hypothesis in b.
Q3. Let X_1 X_(2 )…,X_(n )be a random sample of size n from a population with probability density function given by

f(x;θ)={θe^(-θx) } ;0 <x<∞ 0;elswhere

A, setup the log-likelihood function
B, Find the maximum likelihood estimator (θ ) ̂ of θ.
C, Find the Cramer-Rao Lower Bound for the variance of an unbiased estimator (θ ) ̂ of θ.

3. Andale Post author

Abi, Can you post your question on our forum? (Click the link at the top of the page). One of our mods will be glad to help. Thanks!!