Sampling in Statistics > Bernoulli Sampling
What is Bernoulli Sampling?
Bernoulli sampling is an equal probability, without replacement sampling design. In this method, independent Bernoulli trials on population members determines which members become part of a sample. All members have an equal chance of being part of the sample. The sample sizes in Bernoulli sampling are not fixed, because each member is considered separately for the sample. The method was first introduced by statistician Leo Goodman in 1949, as “binomial sampling”.
The sample size follows a binomial distribution and can take on any value between 0 and N (where N is the size of the sample). If π is the probability of a member being chosen then the expected value (EV) for the sample size is πN. for example, let’s say you had a sample size of 100 and the probability of choosing any one item is 0.1, then the EV would be 0.1 * 100 = 10. However, the sample could theoretically be anywhere from 0 to 100.
Example of Bernoulli Sampling: A researcher has a list of 1,000 candidates for a clinical trials. He wants to get an overview of the candidates and so decides to take a Bernoulli sample to narrow the field. For each candidate, he tosses a die: if it’s a 1, the candidate goes into a pile for further analysis. If it’s any other number, it goes into another pile that isn’t looked at. The EV for the sample size is 1/6 * 1,000 = 167.
An advantage to Bernoulli sampling is that it is one of the simplest types of sampling methods. One disadvantage is that it’s not known how large the sample is at the outset.
In SAS: Bernoulli sampling is specified with METHOD=BERNOULLI. The sampling rate is specified with the SAMPRATE= option.
In R: S.BE(N, prob) will choose a sample from population of size N with a probability of prob. You can find several examples of the use of S.BE here, including this example:
# Vector U contains the label of a population of size N=5
U <- c("Yves", "Ken", "Erik", "Sharon", "Leslie") # Draws a Bernoulli sample without replacement of expected size n=3 # The inlusion probability is 0.6 for each unit in the population sam <- S.BE(5,0.6) sam # The selected sample is U[sam]
The Bernoulli Distribution
There are two variables in a Bernoulli Distribution: n and p.
- “n” represents how many times an experiment is repeated. In a Bernoulli, n = 1.
- “p” is the probability of a specific outcome happening. For example, rolling a die to get a six gives a probability of 1/6. The Bernoulli Distribution for a die landing on an odd number would be p= 1/2.
The Bernoulli and binomial distribution are often confused with each other. However, the difference between the two is slim enough for both to be used interchangeably. Technically, the Bernoulli distribution is the Binomial distribution with n=1.
A Bernoulli distribution is a Bernoulli trial. Each Bernoulli trial has a single outcome, chosen from S, which stands for success, or F, which stands for failure. For example, you might try to find a parking space. You are either going to be successful, or you are going to fail. Many real-life situations can be simplified to either success, or failure, which can be represented by Bernoulli Distributions.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!