## What is the Pareto Distribution?

The Pareto distribution is a skewed distribution with heavy, or “slowly decaying” tails (i.e. much of the data is in the tails).

The Pareto distribution (created by the 19th Century Italian economist Vilfredo Pareto) is defined by a shape parameter, α (also called a slope parameter or Pareto Index) and a location parameter, X. It has two main applications:

- To model the distribution of incomes.
- To model the distribution of city populations.

However, it can be used in a variety of other situations. For example, it can be used to model the lifetime of a manufactured item with a certain warranty period.

The Pareto distribution is expressed as:

F(x) = 1 – (k/x)^{α}

where

x is the random variable

k is the lower bound of the data

α is the shape parameter

You might also see this written as:

F(x) = 1 – (kλ^{k}/x^{k+1})^{α}

When used to model income distribution, this particular version of the formula has λ as the minimum income and k as the distribution of income.

## The Survival Function

Most texts on the Pareto function mention a “survival function,” although this is sometimes also called a tail function or reliability function. This is just the probability of values greater than X. For example, you may be looking at household income in the United States and want to know what proportion of household income is greater than $1,000,000.

## The Pareto Principle

The Pareto Principle is derived from the Pareto distribution and is used to illustrate that many things are not distributed evenly. Originally written to state that 20% of the population holds 80% of the wealth, it can be applied more universally. For example, 1% of the population holds 99% of the wealth. However, it can be used to model any general situation where situations are not evenly distributed. For example, the top 20% of workers might produce 80% of output.

**Next**: The Pareto Principle.

If you prefer an online interactive environment to learn R and statistics, this *free R Tutorial by Datacamp* is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try *this Statistics with R track*.

*Facebook page*and I'll do my best to help!