What is Misclassification?Misclassification (or classification error) happens when a participant is placed into the wrong population subgroup or category because of some kind of observational or measurement error. When this happens, the true link between exposure and outcome is distorted.
People might be placed into the wrong groups because of:
- Incomplete medical records.
- Recording errors in records.
- Misinterpretation of records.
- Errors in records, like incorrect disease codes, or patients completing questionnaires incorrectly (perhaps because they don’t remember (see: “recall bias“) or misunderstand the question).
Although care can be taken to minimize the impact of these errors, they are largely unaviodable because human error is innate to any study involving people.
Differential classification error happens when the errors depend on other variables. Non-differential classification error is when the error does not depend on the values of other variables.
Differential misclassification happens when the information errors differ between groups. In other words, the bias is different for exposed and non-exposed, or between those who have the disease and those do have not.
Example of Differential Classification Error (from Arens & Pigeot):
Emphysema is diagnosed more frequently in smokers than in non-smokers. However, smokers may visit the doctor more often for other conditions (e.g. bronchitis) than non-smokers, which means that a reason smokers could be diagnosed with emphysema more often is simply because they go to the doctor more often — not because they actually have higher odds of getting the disease. Unless steps are taken to control for this possibility, emphysema will be under-diagnosed in non-smokers, which is a classification error because the diagnosis is related to the variable “how often smokers visit the doctor, versus non-smokers”.
Non-differential classification error happens when the information is incorrect, but is the same across groups. It happens when exposure is unrelated to other variables (including disease), or when the disease is unrelated to other variables (including exposure). Bias introduced by non-differential misclassification is usually predictable (it goes towards the null value), but this isn’t always the case. Three or more exposure groups (levels) can cause a bias away from the null.
- In case-control studies, non-differential misclassification can happen when exposure status is incorrect for both controls and cases.
- In cohort studies, it happens when exposure status is incorrect for people with the disease and those without the disease.
Example of non-differential misclassification (from Ahrens & Pigeot):
Many studies ask if a patient has “ever used” a particular drug. As this question covers an extremely large time span (possibly many decades), drug use might get erroneously linked to some disease or condition. But as everyone in the study is asked the same error-inducing question, misclassification happens to everyone in the study.
Towards the Null means that the value is close to the null value of the effect measure. For example, the value would close to 1 if you’re using the odds ratio or risk ratio. A bias away from the null would mean that the data is indicating a stronger association than actually exists in real life.
Ahrens & Pigeot. Handbook of Epidemiology. Springer Science & Business Media, Jul 26, 2007.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments? Need to post a correction? Please post on our Facebook page.