Bias >

## What is Misclassification?

Misclassification (or classification error) happens when a participant is placed into the wrong population subgroup or category because of some kind of observational or measurement error. When this happens, the true link between exposure and outcome is distorted.People might be placed into the wrong groups because of:

- Incomplete medical records.
- Recording errors in records.
- Misinterpretation of records.
- Errors in records, like incorrect disease codes, or patients completing questionnaires incorrectly (perhaps because they don’t remember (see: “recall bias“) or misunderstand the question).

Although care can be taken to minimize the impact of these errors, they are largely unavoidable because human error is innate to any study involving people.

**Differential **classification error happens when the errors depend on other variables. **Non-differential** classification error is when the error *does not* depend on the values of other variables.

## Differential Misclassification:

Differential misclassification happens **when the information errors differ between groups.** In other words, the bias is different for exposed and non-exposed, or between those who have the disease and those do have not.

**Example of Differential Classification Error **(from Arens & Pigeot):

Emphysema is diagnosed more frequently in smokers than in non-smokers. However, smokers may visit the doctor more often for other conditions (e.g. bronchitis) than non-smokers, which means that a reason smokers could be diagnosed with emphysema more often is simply because they go to the doctor more often — not because they actually have higher odds of getting the disease. Unless steps are taken to control for this possibility, emphysema will be under-diagnosed in non-smokers, which is a classification error because the diagnosis is related to the variable “how often smokers visit the doctor, versus non-smokers”.

## Non-Differential Misclassification

Non-differential classification error happens when **the information is incorrect, but is the same across groups.** It happens when exposure is unrelated to other variables (including disease), or when the disease is unrelated to other variables (including exposure). Bias introduced by non-differential misclassification is usually predictable (it goes towards the null value), but this isn’t always the case. Three or more exposure groups (levels) can cause a bias away from the null.

- In case-control studies, non-differential misclassification can happen when exposure status is incorrect for both controls and cases.
- In cohort studies, it happens when exposure status is incorrect for people with the disease and those without the disease.

**Example of non-differential misclassification **(from Ahrens & Pigeot):

Many studies ask if a patient has “ever used” a particular drug. As this question covers an extremely large time span (possibly many decades), drug use might get erroneously linked to some disease or condition. But as everyone in the study is asked the same error-inducing question, misclassification happens to everyone in the study.

Dr. Katherine M. Flegal, PhD. from the Stanford University School of Medicine wrote to me regarding how differential misclassification can arise, even though an exposure as non-differential measurement error.

“Let’s say I measure a variable X with error as X’ and then put people into a category based on their value of X’. People who have a value of X’ close to the top of the category are more likely to be misclassified into the next higher category than are people with values of X’ close to the middle of the category. People with a value of X’ close to the bottom are more likely to be misclassified into the next lower category. Now lets say that X is also associated with some FUTURE outcome. Well people with a value of X’ close to the top of the category are more likely to develop the outcome than are people with values of X’ close to the bottom of the category. So now people with a high value of X’ are both more likely to be misclassified and also more likely to develop the outcome. So even though the measurement error was non-differential, the misclassification is differential and the direction is not necessarily towards the null. And this can happen in prospective studies in which X is measured at baseline before the outcome has even occurred.

Precisely this issue arises all the time. For instance, researchers use self-reported weight and height data to calculate BMI, but the calculated BMI has measurement error because of the self-report. Then the researchers divide BMI into categories like normal weight, overweight etc. Even though the study is prospective the misclassification will be differential (unless of course X itself is not associated with the outcome at all.)”

If you’re interested in reading further on this topic, Dr. Flegal included the following resources:

Brenner H, Blettner M. Misclassification bias arising from random error in exposure measurement: implications for dual measurement strategies. Am J Epidemiol. 1993;138(6):453-61.

Brenner H, Loomis D. Varied forms of bias due to nondifferential error in measuring exposure. Epidemiology. 1994;5(5):510-7.

Dosemeci M, Wacholder S, Lubin JH. Does nondifferential misclassification of exposure always bias a true effect toward the null value? Am J Epidemiol. 1990;132(4):746-8.

Flegal KM, Keyl PM, Nieto FJ. Differential misclassification arising from nondifferential errors in exposure measurement. Am J Epidemiol. 1991;134(10):1233-44.

Wacholder S, Dosemeci M, Lubin JH. Blind assignment of exposure does not always prevent differential misclassification. Am J Epidemiol. 1991;134(4):433-7.

Wacholder S, Hartge P, Lubin JH, Dosemeci M. Non-differential misclassification and bias towards the null: a clarification. OccupEnvironMed. 1995;52(8):557-8.

**Notes**:

**Towards the Null** means that the value is close to the null value of the effect measure. For example, the value would close to 1 if you’re using the odds ratio or risk ratio. A bias** away from the null** would mean that the data is indicating a stronger association than actually exists in real life.

**References**:

Ahrens & Pigeot. Handbook of Epidemiology. Springer Science & Business Media, Jul 26, 2007.