Fake Statistics: Overview
Sometimes you think you can trust results from a survey, but it isn’t always easy to spot fake statistics. Do you believe an egg company when it tells you 50% of consumers in a taste test preferred their eggs? How about if a voluntary survey of U.S. Marines showed overwhelming support for massive pay increases for military personnel? Sometimes it isn’t enough to just accept the data as it is presented. Dig a little deeper and you might uncover one of these common problems with stats.
Finding Fake Statistics: Steps
Step 1: Take a close look at who paid for the survey. If you read a statistic stating 90% of people lost 20 pounds in a month on a certain “miracle” diet, look at who paid. If it was the company who owns that “miracle” product, then it’s likely you have what’s called a self-selection study. In a self-selection study, someone stands to gain financially from the results of a trial or survey. You may have seen those soda ads where “90% of people prefer the taste of product X.” But if the manufacturer of product X paid for that survey, you probably can’t trust the results.
Step 2: Take a look to see if the statistics came from a voluntary survey. A voluntary response sample is a sample where the participants can choose to be included in the sample or not. For example, if your professor sent you an email with an invitation to comment on what you think of a new textbook, then that would be a voluntary response sample. If it was a mandatory part of your course, then that would not be a voluntary response sample. These types of samples are not suitable for statistics because they carry a heavy bias toward people who have strong opinions (often negative ones). In other words, students are more likely to respond to the above survey if they hate the textbook. The students who like it will probably be less likely to respond.
Step 3: Look for the faulty conclusion that one variable causes another in the survey. For example, you might read a statistic that states unemployment causes an increase in corn production because corn products (like high fructose corn syrup) are cheap and therefore people are more likely to buy cheap foods when unemployed. But there may be many other factors causing an increase in production including an increase in government subsidies for corn. Just because one factor is seemingly connected to another (correlation), that doesn’t necessarily imply causation (that one caused the other). More info: see Correlation vs. Causation.
Step 4: Beware of publication bias. Journals are likely to report positive results (for example, a drug trial that had a positive outcome) rather than a drug trial that failed. Just because a journal publishes a positive result doesn’t mean that there aren’t other trials out there that reported a negative result.
Step 5: Make sure the sample size isn’t too limited in scope. It’s unlikely you can make generalizations about student achievement in the U.S. by studying a single inner city school in Brooklyn. And it’s unlikely you can make generalizations about American polling behavior by standing outside a polling booth in Ponte Vedra Beach, Florida. Just as inner city schools don’t behave like every other school, an affluent neighborhood can’t be used to generalize about the voting population. Also, make sure the sample size is large enough. If your voting precinct contains 1 million voters, it’s unlikely you’ll get any good results from surveying 20 people.
Step 6: Watch out for misleading percentages. Unemployment may have “slowed by 50%,” but if the unemployment rate was previously 100,000 new unemployment claims per month, that still means 50,000 people are joining the unemployed ranks every month.
Step 7: Beware of precise numbers. If a national survey reports that 3,150,023 households in the U.S. are dog owners, you might be inclined to believe that exact figure. However, it’s highly unlikely (and almost impossible) that anyone would have seriously surveyed all of the households in the U.S. It’s much more likely they surveyed a sample and that 3,150,023 is an estimate and should have been reported as 3 million to avoid being misleading.
There are many other examples of fake statistics. Newspapers sometimes print erroneous figures, drug companies print fake test results, governments present fake statistics in their favor. The golden rule is: question every statistic that you read!
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!