Sampling > Bootstrap Sample

## What is a Bootstrap Sample?

A **bootstrap sample** is a smaller sample that is “bootstrapped” from a larger sample. Bootstrapping is a type of *re*sampling where large numbers of smaller samples of the same size are repeatedly drawn, with replacement, from a single original sample.

For example, let’s say your sample was made up of ten numbers: 49, 34, 21, 18, 10, 8, 6, 5, 2, 1. You randomly draw three numbers 5, 1, and 49. You then replace those numbers into the sample and draw three numbers again. Repeat the process of drawing *x *numbers *B* times. Usually, original samples are much larger than this simple example, and B can reach into the thousands. After a large number of iterations, the bootstrap statistics are compiled into a** bootstrap distribution. **You’re replacing your numbers back into the pot, so your resamples can have the same item repeated several times (e.g. 49 could appear a dozen times in a dozen resamples).

Bootstrapping is loosely based on the law of large numbers, which states that if you sample over and over again, your data should approximate the true population data. This works, perhaps surprisingly, even when you’re using a single sample to generate the data.

- An
*empirical*bootstrap sample is drawn from observations. - A
*parametric*bootstrap sample is drawn from a parameterized distribution (e.g. a normal distribution).

## Why Resample?

Ideally, you would want to draw **large, non-repeated, samples from a population** in order to create a sampling distribution for a statistic. However, you may be limited to one sample because of finances or time. This single sample method can serve as a mini population, from which repeated small samples are drawn with replacement over and over again. As well as saving time and money, bootstrapped samples can be quite good approximations for population parameters.

## Running the Procedure

Bootstrapping is usually performed with software (e.g. Stata or with the R Bootstrap package); The process generally follows three steps:

- Resample a data set
*x*times, - Find a summary statistic (called a
**bootstrap statistic**) for each of the x samples, - Estimate the standard error for the bootstrap statistic using the standard deviation of the bootstrap distribution.

## Notation

- The number of bootstrap samples can be indicated with B (e.g. if you resample 10 times then B = 10).
- A bootstrap sample is identified by “star” notation: x*
_{1}, x_{2*},…x*_{n}. This is similar to the notation for sample data, which is traditionally denoted by: x_{1}, x_{2},…x_{n } - A star next to a statistic, like s* or x̄* indicates the statistic was calculated by resampling. A bootstrap statistic is sometimes denoted with a T, where T*
_{b}would be the B^{th}bootstrap sample statistic T.

## Bootstrap Percentile Method

The **bootstrap percentile method** is a way to calculate confidence intervals for bootstrapped samples.

With the **simple method,** a certain percentage (e.g. 5% or 10%) is trimmed from the lower and upper end of the sample statistic (e.g. the mean or standard deviation). Which number you trim depends on the confidence interval you’re looking for. For example, a 90% confidence interval would generate a 100% – 90% = 10% trim (i.e. 5% from both ends). Or, put another (slightly more technical) way, you can get a 90% confidence interval by taking the lower bound 5% and upper bound 95% quantiles of the B replication T_{1}, T_{2},…T_{B}.

A** more complicated method** is Efron’s BCa method (see DiCiccio and Efron, 1993), which stands for Bias-corrected and accelerated. As well as adjusting for bias, it also corrects skewness in the model. Other variants include Rubin’s Bayesian extension and DiCiccio and Efron’s ABC method.

This **trimmed range** for the statistic is the confidence interval for the population parameter of interest.

**References:**

DiCiccio, T.J. and Efron B. (1996) Bootstrap confidence intervals. Statistical Science, 11, 189-228.

Efron, B. and Tibshirani, R. (1993) An Introduction to the Bootstrap. Chapman and Hall, New York, London.

Rubin, D (1981). The Bayesian bootstrap. Annals of Statistics 9 130–134.

**Need help with a homework or test question?** Chegg offers 30 minutes of free tutoring, so you can try them out before committing to a subscription. Click here for more details.

If you prefer an **online interactive environment** to learn R and statistics, this *free R Tutorial by Datacamp* is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try *this Statistics with R track*.

*Facebook page*.