Sampling > Sampling with replacement / Sampling without replacement

Contents (click to skip to that section):

## Sampling with Replacement

Sampling with replacement is used to find **probability with replacement**. In other words, you want to find the probability of some event where there’s a number of balls, cards or other objects, and you replace the item each time you choose one.

Let’s say you had a population of 7 people, and you wanted to sample 2. Their names are:

- John
- Jack
- Qiu
- Tina
- Hatty
- Jacques
- Des

**sample with replacement**, you would choose one person’s name, put that person’s name back in the hat, and then choose another name. The possibilities for your two-name sample are:

- John, John
- John, Jack
- John, Qui
- Jack, Qui
- Jack Tina
- …and so on.

When you sample with replacement, your two items are independent. In other words, one does not affect the outcome of the other. You have a 1 out of 7 (1/7) chance of choosing the first name and a 1/7 chance of choosing the second name.

- P(John, John) = (1/7) * (1/7) = .02.
- P(John, Jack) = (1/7) * (1/7) = .02.
- P(John, Qui) = (1/7) * (1/7) = .02.
- P(Jack, Qui) = (1/7) * (1/7) = .02.
- P(Jack Tina) = (1/7) * (1/7) = .02.

Note that P(John, John) just means “the probability of choosing John’s name, and then John’s name again.” You can figure out these probabilities using the multiplication rule.

But what happens if you don’t replace the first name before you choose the second? In other words, what happens if you sample without replacement?

## Sampling Without Replacement

Sampling without Replacement is a way to figure out **probability without replacement**. In other words, you don’t replace the first item you choose before you choose a second. This dramatically changes the odds of choosing sample items. Taking the above example, you would have the same list of names to choose two people from. And your list of results would similar, except you couldn’t choose the same person twice:

- John, Jack
- John, Qui
- Jack, Qui
- Jack Tina…

But now, your two items are **dependent**, or linked to each other. When you choose the first item, you have a 1/7 probability of picking a name. But then, assuming you don’t replace the name, you only have six names to pick from. That gives you a 1/6 chance of choosing a second name. The odds become:

- P(John, Jack) = (1/7) * (1/6) = .024.
- P(John, Qui) = (1/7) * (1/6) = .024.
- P(Jack, Qui) = (1/7) * (1/6) = .024.
- P(Jack Tina) = (1/7) * (1/6) = .024…

As you can probably figure out, I’ve only used a few items here, so the odds only change a little. But larger samples taken from small populations can have more dramatic results.

You can tell *how *dramatic these results are by calculating the covariance. That’s a measure of how much probabilities of two items are linked together; the higher the covariance, the more dramatic the results. A covariance of zero would mean there’s no difference between sampling with replacement or sampling without.

## References

Agresti A. (1990) Categorical Data Analysis. John Wiley and Sons, New York.

Dodge, Y. (2008). The Concise Encyclopedia of Statistics. Springer.

Everitt, B. S.; Skrondal, A. (2010), The Cambridge Dictionary of Statistics, Cambridge University Press.

Gonick, L. (1993). The Cartoon Guide to Statistics. HarperPerennial.