Samples are parts of a population. For example, you might have a list of information on 100 people out of 10,000 people. You can use that list to make some assumptions about the entire population’s behavior. Unfortunately, it’s not quite that simple. When you do stats, your sample size must be optimal — not too large or too small. Then once you’ve decided on a sample size you must use a sound technique for actually drawing the sample from the population. There are two main areas:
- Probability Sampling uses randomization to select sample members. The probability of each member being chosen for the sample is known, although the odds do not have to be equal.
- Non-probability sampling uses non-random techniques (i.e. the judgment of the researcher). This is where you can’t calculate the odds of any particular item, person or thing being included in your sample.
- Bernoulli sampling is where independent Bernoulli trials on population elements determine whether the element becomes part of the sample. All population elements have an equal probability of being included in each selection of a single sample. The sample sizes in Bernoulli samples follow a binomial distribution. Poisson sampling is less common. Each population member being sampled is given an independent Bernoulli trial to determine if the element is included in the sample.
- Cluster sampling divides the population into groups (clusters). A random sample is then selected from the clusters. It’s used when researchers don’t know the individuals in a population but they do know which groups are in a population.
- In systematic sampling elements are selected for a sample from an ordered sampling frame. A sampling frame is just a list of participants that you want to get a sample from. One type of systematic sampling is the equal-probability method where an element is selected from a list and then every kth element is selected using the equation k = N\n where n is the sample size and N is the size of the population.
- SRS is where a Simple Random Sample is chosen completely randomly so that each element has the same probability of being chosen as any other element and each subset of elements has the same probability of being chosen as any other subset of k elements.
- In stratified sampling, each subpopulation is sampled independently. The population is first divided into homogeneous subgroups before getting the sample. Each population member only belongs to one group. Simple random or systematic sampling is applied within each group to choose the sample. Stratified Randomization is a sub-type of stratified sampling used in clinical trials. Patients are divided into strata and then randomized with permuted block randomization.
Less Common Types
If you are taking elementary statistics or AP statistics you’ll rarely (if ever) come across these techniques:
- Accidental sampling (also known as grab, convenience or opportunity sampling) is where a sample is drawn from a convenient, readily available population. It doesn’t give a representative sample for the population but can be useful for pilot testing.
- Demon algorithm (physics) is used to sample members of a microcanonical ensemble (used to represent the possible states of a mechanical system which has an exactly specified total energy) with a given energy. The “demon” is a degree of freedom in the system which stores and provides energy.
- Critical Case Samples: Critical cases are carefully chosen to maximize the information you can get from a handful of samples.
- Discrepant case sampling is where you choose cases that appear to contradict your findings.
- Distance sampling is a widely used technique that estimates the density or abundance of animal populations.
- The experience sampling method samples experiences (rather than individuals or members). In this method, study participants stop at certain times and make notes of their experiences as they experience them.
- Haphazard Sampling: where a researcher chooses items haphazardly, trying to simulate randomness. However, the result may not be random at all and is often tainted by selection bias.
- Inverse Sampling is based on negative binomial sampling. Samples are taken until a specified number of successes have happened.
- Importance Sampling: A method to model rare events.
- The Kish grid is a way to select members within a household for interviews and uses a random number tables for the selections.
- Latin hypercube sampling is used to construct computer experiments. It generates samples of plausible collections of values for parameters in a multidimensional distribution.
- In line-intercept sampling, an element is included in a sample from a particular region if a certain line segment intersects the element.
- Maximum Variation Samples are taken when you want to include extremes (like rich/poor or young/old). A related technique is Multistage sampling is one of a variety of cluster sampling techniques where random elements are chosen from a cluster (instead of every member in the cluster).
- Quota sampling is a way to select survey participants. It’s similar to statified sampling but members from a group are chosen based on judgment. For example, people closest to the researcher might be chosen for ease of access.
- Respondent Driven Sampling. A chain-referral sampling method where participants recommend other people they know.
- A sequential sample is one that doesn’t have a set size; items are taken one (or a few) at a time. It’s commonly used in ecology.
- A Snowball sample is where existing study participants recruit future study participants from people they know.
- Square root biased sample is a way to decide who is chosen for additional screenings at airports. It is a combination of SRS and profiling.
Sampling error is the error that occurs because you’re taking a sample from the population rather than using the entire population. In other words, it’s the difference between the statistic you measure and the parameter you would find if you took a census of the entire population.
If you were to survey the entire population (like as in the US Census), there would be no error. Exactly how much error there is in a sample can’t be calculated. However, when samples are taken at random, the error is estimated and called the margin of error.For example, you might take a survey of 1,000 people and estimate that 19.357% of the population is aged under 18. If the actual percentage is 19.300%, the difference (19.357 – 19.300) of 0.57 or 3% is the margin of error. If you continued to take samples of 1,000 people, you’d probably get slightly different statistics, 19.1%, 18.9%, 19.5% etc, but they would all be around the same figure. This is one of the reasons that you’ll often see sample sizes of 1,000 or 1,500 in surveys: they produce a very acceptable margin of error of about 3%.
Formula: the formula for the margin of error is 1/√n, where n is the size of the sample. For example, a random sample of 1,000 has about a 1/√n; = 3.2% error.
Sample error can’t be eliminated, but it can be reduced. It is considered to be an acceptable trade off to avoid measuring the entire population. In general, the larger the sample, the smaller the margin of error. There is a notable exception: if you use cluster sampling, this may increase the error because of the similarities between cluster members. A carefully designed experiment or survey can also reduce error.
Another Type of Error
Another reason there could be a difference between the sample statistic and the actual population parameter is called non-sampling error. This is due to poor data collection methods (like faulty instruments or inaccurate data recording, selection bias, non response bias (where individuals don’t want to or can’t respond to a survey), or other mistakes in collecting the data. Increasing the sample size will not reduce these errors. They key is to avoid making the errors in the first place with a well-planned design for the survey or experiment.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you’re are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.
- What is the Large Enough Sample Condition?
- What is a Sample?
- How to Find a Sample Size in Statistics.
- What is the 10% Condition?
- What is Efficiency?
- What is an Effective Sample Size?
- Finite Population Correction Factor.
- Markov Chain Monte Carlo
- What is a Typical Case?
- What is a Sample Size?
- How to Use Slovin’s Formula.
- How to Find a Sample Size Given a Confidence Interval and Width (Known or Unknown Standard Deviation).
- Samp. Distributions.
- Samp. Distribution of the Sample Proportion.
- Sampling variability.
Check out our YouTube channel for more stats tips and help!