Probability Distributions > Heavy Tailed Distribution

## What is a Heavy Tailed Distribution?

A heavy tailed distribution tends to have very large values with many outliers (very high values). The heavier the tail, the larger the probability that you’ll get one or more very large values in a sample.

**Characteristics of the Heavy Tailed Distribution:**

If you take a random sample from the distribution, you’re likely to end up with a sample made up from mostly small values. For example, if you sample from the income of people in the United States, the bulk of your data will be around $50,000. However, one or two values in your sample could be ridiculously large (i.e. outliers); Bill Gates earned over $11 billion in 2013.

These large values tend to skew your sample statistics: the sample variance will probably be very large and the sample mean usually underestimates the population mean. Another couple of quirks with heavy tails:

- The Central Limit Theorem doesn’t work.
- Some moments don’t exist, so order statistics are used instead.

## Heavy Tailed Distributions in the Real World

Many real world situations are heavy tailed, including:

- The top 1% of the population in the USA owns as much as the bottom 90% (Guardian).
- File sizes in computers (Columbia).
- Web page sizes and computer systems’ workloads (Stanford).
- Insurance Payouts and Financial Returns (Wolfram).

## Heavy Tailed Distribution Examples

Weibull distribution.

This family of distributions is used in assessing product reliability to model failure times and life data analysis.

The Cauchy has fatter tails and a taller peak than a normal distribution. It is widely known for the fact that it’s expected value does not exist.

Log normal distribution.

A lognormal (log-normal or Galton) distribution is a probability distribution with a normally distributed logarithm.

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!
if the Central Limit Theorem doesn’t work for heavy-tails, how to estimate of the mean heavy-tail dataset from sample mean?

As the distribution has a lot of outliers, the mean is not a good estimate. Use a resistant measure of central tendency instead, like the median.