Sampling >

## What is Inverse Sampling?

In inverse sampling (sometimes called standard inverse sampling), you continue to choose items **until an event has occurred a specified number of times.** It is often used when you don’t know the exact size of the sample you want to take. For example, let’s say you were conducting a wildlife management survey and wanted to capture 20 banded birds. You capture birds at random until you have collected 20 banded birds (and an unknown number of unbanded birds). The resulting sample size could be 100 birds, or it could be 88 birds, or 203 birds and so on.

*not*going to happen, this type of sampling is called

*negative binomial sampling*. The two terms (inverse sampling and negative binomial sampling) are often used interchangeably. However, Inverse Sampling can refer to sampling based on either the Negative Binomial Distribution (in which case it is called Negative Binomial Sampling) or the Poisson Distribution. If sampling is performed based on the Poisson distribution, you’re interested in events occurring in a certain time period. For example, you might be interested in how many accidents happen on a particular road over the period of one week.

## Types and Designs

Paul Lavrakas outlines two **types of inverse sampling** in the *Encyclopedia of Survey Research Methods*:

**Vector-at-a-time sampling**involves drawing two observations at the same time (one from each of two populations). Sampling stops when x Successes are obtained from one of the populations.**Play-the-winner sampling**is where one item at a time is drawn from one of the (randomly selected) populations. Sampling stops when a Failure happens and then sampling continues in the second population until a Failure happens. At that point, sampling continues, switching back and forth between the two populations until a pre-specified number of Successes has occurred.

The two basic methods for Inverse Sampling are **Multiple Inverse Sampling (MIS) and General Inverse Sampling (GIS).**

**MIS**(originally proposed by Chang and colleagues) is used when sub-population sizes are known. It’s main advantage is that it avoids the problem of empty strata in post-stratification. In stratification, the sample is selected, then split into strata (for example, by sex, age, or other characteristics). Some of the selected strata may end up with zero entries, especially if the category is rare. MIS ensures that at least*n*items will end up in each stratum (sampling is continued for a specified number of observations in each stratum).**GIS**avoids sampling an “infeasible number of units” (Salehi & Seber). Sampling is performed until strata contain a pre-specified number of units*or*until a maximum sample size has been reached.

## Applications

**Inverse sampling is often performed when a certain characteristic is rare**. For example, it is a good method for detecting differences between two different treatments for a rare disease; It avoids the problem of sparse data due to a disease’s rarity. According to Lavkras, Play-the-winner sampling is preferred for clinical trials because the sample with the poorer population will always be smaller than the sample with the “better” population.

## Advantages and Disadvantages

In general, inverse sampling will give you more precise estimates than direct sampling (Scheaffer et. al, 2011), as long as the sample size *n *required to obtain *n *individuals is small compared the the population size N. However, as the sample size is unknown and could theoretically be infinite (in some cases), this technique can be costly, labor intensive, and time consuming. Compared to random sampling estimated variances are usually much larger.

**References:**

Paul J. Lavrakas. Encyclopedia of Survey Research Methods.

Kuang-Chao Chang et. al. MIS sampling in post stratification. PDF.

M Salehi & Seber G. A general inverse sampling scheme and its application to adaptive cluster sampling.

Scheaffer et. al (2011). Elementary Survey Sampling. Cengage Learning.

**Need help with a homework or test question?** Chegg offers 30 minutes of free tutoring, so you can try them out before committing to a subscription. Click here for more details.

If you prefer an **online interactive environment** to learn R and statistics, this *free R Tutorial by Datacamp* is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try *this Statistics with R track*.

*Facebook page*.