Sampling >

Importance sampling is a way to predict the **probability of a rare event**. Along with Markov Chain Monte Carlo, it is the primary simulation tool for generating models of hard-to-define probability distributions.

Rare events can usually be found on the tails of probability distributions. For example, on a bell curve for IQ, the Albert Einsteins of the world are found above three standard deviations from the mean. The rarity of finding results like this makes it extremely difficult to sample large enough numbers for any meaningful statistical analysis. In addition, the probability distribution for these rare events are going to look markedly different from the bell curve. Although predicting when another Einstein might be born is probably not that critical, predicting other rare events — like fatigue in engineering structures or landfall for category 5 hurricanes — can be a matter of life and death.

As well as finding probabilities in tails, importance sampling can also be used to find expectations of random functions.

## Biasing Density

One way to produce large enough samples is to **change the probability density function** to generate more rare events. This alternate density function is derived from the original function of interest (in the above example, the bell curve) and is usually called the *biasing density*. The end goal is to reduce the variance of your estimates. The basic steps are:

- Choose a model for the process you want to study (i.e. derive the biasing density function and define the model’s parameters (e.g. the mean and variance)),
- Draw random samples from the parameterized model,
- Run your statistical analysis on the biasing density function,
- Modify those results to reflect the changes you made to the probability distribution.
- Analyze the output.

## Importance Sampling and Monte Carlo Procedures

Importance sampling **speeds up Monte Carlo procedures for rare events **(a “Monte Carlo procedure” is sampling based on random walks). As it speeds up the process, it’s sometimes referred to as “fast simulation using importance sampling.” It’s also called a “forced Monte Carlo procedure” because it’s forcing the Monte Carlo procedure to behave somewhat abnormally.

If you’re using Monte Carlo procedures, you’re more than likely using software because of the large number of computations involved. Many statistical software packages include Monte Carlo algorithms, including Minitab, R and SPSS.

## Formulas

The formulas behind Importance Sampling are somewhat esoteric, mainly because of the calculus involved. As a (relatively) simple example, let’s say you wanted to create an expectation for some function, f:

μ_{f} = ℰ_{p}[f(X)], with

_{f}= ∫ f(x)p(x)dx

Then for any probability density function q(x) that satisfies q(x) > 0 when f(x)p(x)≠ 0, you have:

_{f}= ℰ

_{q}[w(X)f(X)]

Where:

- w(x) = p(x)/q(x)
- ℰ
_{q}[] = expectation with respect to q(x).

You can now use a sample of independent draws from q(x) to estimate μ_{f} by

**References:**

Neal, R. M. (2001). Annealed Importance Sampling. Statistics and

Computing (11) 125-139.

Oh, M.-S. and Berger, J. O. (1992). Adaptive Importance Samplin in

Monte Carlo Integration. Journal of Statistical Computation and Simulation

(41) 143-168.

Srinivasan, R. (2013). Importance Sampling: Applications in Communications and Detection. Springer Science & Business Media.

Tokdar, S. & Kass, R. (2009). Importance Sampling: A Review. Retrieved 8/18/2017 from: http://www2.stat.duke.edu/~st118/Publication/impsamp.pdf

**Need help with a specific statistics question?** Chegg offers 30 minutes of free tutoring, so you can try them out before committing to a subscription. Click here for more details.

If you prefer an **online interactive environment** to learn R and statistics, this *free R Tutorial by Datacamp* is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try *this Statistics with R track*.

*Facebook page*.