Outliers > Modified Thompson Tau Test
What is the modified Thompson Tau Test?
The modified Thompson Tau test is a way to find outliers in a data set. The data set must be a single variable (e.g. x1, x2,…xn). One potential outlier is tested at a time using a version of the t-test. Roughly speaking, the Tau test eliminates outliers more than two standard deviations away from the mean.
Like most tests for outliers, there is the possibility that you could eliminate good data (especially if there is a cluster of outliers), so you should interpret the results of the test with caution.
Running the Test
In order to run the test, you first have to identify a possible outlier.
Example question: Are any of the following points outliers? : 489, 490, 490, 491, 494, 499, 499, 500, 501, and 505.
Step 1: Find the sample mean. The mean for this set of data is 495.8.
Step 2: Subtract the mean from the highest and lowest data point to find the absolute value of the differences. As a formula, that’s:
δi = |xi – x̄|.
- |489 – 495.8.| = 6.8
- |505 – 495.8.| = 9.2
The point with the highest absolute difference (δ) is a suspected outlier. This is the one you’ll test. For this example, that’s 9.2.
Step 1: Look up the sample size (n) in the Tau table below to get the Tau value (for the formula behind the table calculations see Tau Formula below):
For a sample size of 10, Tau is 1.7984.
Step 2: Calculate the standard deviation (s) for the sample. For this set of data, s = 5.67.
Step 3: Multiply Tau (Step 1) by s (Step 2):
Tau * s = 1.7984 * 5.67 = 10.2
Step 4: Compare the absolute difference (δ) for the suspected outlier (from Part 1) with Tau * s (Step 3).
If δ > Tau * s, the point is an outlier.
9.2 is not greater than 10.2, so is therefore not an outlier.
Repeating the Steps
In the above example, the point with the largest absolute difference was not an outlier. If the point is an outlier, repeat the steps above for the point with the next largest deviation. However, when you repeat the calculations, you must remove the outlier you identified before recalculating the mean and finding the new Tau.
If you’re using the table, you don’t really need the formula (unless you’re calculating some specific sample size not listed). Tau is calculated from T critical values of student’s T-distribution, which identify a rejection region. The formula is:
- n = sample size
- tα/2 = student’s T critical value (based on an alpha level of 5%) with two degrees of freedom
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments are now closed for this post. Need to post a correction? Please post a comment on our Facebook page.