Self-selection bias is a bias that is introduced into a research project when participants choose whether or not to participate in the project, and the group that chooses to participate is not equivalent (in terms of the research criteria) to the group that opts out.
Examples of Self-Selection Bias
Suppose you were running a mail-in poll on how many people in a district could read. Your results would be severely affected by self selection bias because only those who received the survey and read it would be likely to send it back.
Most examples of self-selection bias are less obvious but can still skew results in a large way. Take a poll which measures level of confidence in parenting among university graduates. Those who are proud of their parenting are more likely to want to talk about it, and therefore more likely to fill in the survey. There will be less representation from the graduates who are not confident in their parenting ability or who are ashamed of their track record.
Self-selection bias happens in more areas than just sociology; Biology is rife with it as well. For an extreme example, consider a study on eating habits of deer where you recorded behavior from a lawn chair. Only that particular subset of deer that felt comfortable enough with humans to wander onto the lawn in plain view would be included in the study. Those that preferred forest ferns to garden flowers would be disproportionally represented.
Managing Self-Selection Bias in Research
As a researcher you will want to design your experiment to reduce self selection bias as much as possible. Where self-selection bias cannot be eliminated, it should be quantified as much as possible so we understand what we’re dealing with. For example, comparing your sample with population data can help you see how your selection might differ from a random one. Reporting relevant self-selection bias should always be part of reporting survey results.
In our first example, one could change the mail-in poll for a door-to-door survey in person. You might go personally to each randomly selected address and ask about reading skills. There would still be a smaller bias based on the shame factor of saying “I can’t read”, but it would be smaller. If you could find a way to test each person’s reading without the respondent realizing he was being tested, he would be likely to have eliminated self selection bias almost altogether.
If you can’t eliminate self-selection bias you might weight results instead. Sample points that are less likely to have been included may be given more weight than points likely to self-select.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!