 # Dummy Variables / Indicator Variable: Simple Definition, Examples

Share on

Types of Variable > Dummy Variables

## What are Dummy Variables?

Dummy variables (sometimes called indicator variables) are used in regression analysis and Latent Class Analysis. As implied by the name, these variables are artificial attributes, and they are used with two or more categories or levels. It’s used when you want to work with categorical variables which have no quantifiable relationship with each other.

For example, race can be categorized by Caucasian, African American, Asian, Hispanic, Other. If you assign the numbers 1-5 for these categories when performing regression analysis, the results would make no sense at all (is the “Other” category in any way 5 times the “Caucasian” category?). However, if you create a variable called Caucasian and assign the dummy variable 1 to mean “is Caucasian” and 0 to mean “is not Caucasian” then you can start to see how dummy variables are useful.

In latent class analysis, the term indicator variable means something more specific, although it’s still an artificial variable. A set of observed variables can “indicate” the presence of one or more latent (hidden) variables — hence the term indicator variable.

## Coding Categorical variables with multiple levels

If you have a categorical variable with more than two levels (groups or levels are different groups in the same independent variable), multiple dummy variables need to be created. In the above example, the categorical variable “Race” has five levels (Caucasian, African American, Asian, Hispanic, Other). The formula k-1 is used to decide how many dummy variables to code, where “k” is the number of levels. In other words, only four of these five levels are coded with dummy variables. Which variable should you leave out? It’s usually the largest group to which all the others will be compared. In this example, let’s assume it’s some sort of data for Mexico City, Mexico. the largest group would be Hispanic and that would be the level left out. Ultimately, which variable is not coded with a dummy variable is up to you, the researcher and which variable you are comparing the others to.

## References

Edwards, A. (1976). An introduction to linear regression and correlation. W. H. Freeman
Everitt, B. S.; Skrondal, A. (2010), The Cambridge Dictionary of Statistics, Cambridge University Press.

CITE THIS AS:
Stephanie Glen. "Dummy Variables / Indicator Variable: Simple Definition, Examples" From StatisticsHowTo.com: Elementary Statistics for the rest of us! https://www.statisticshowto.com/dummy-variables/
---------------------------------------------------------------------------  Need help with a homework or test question? With Chegg Study, you can get step-by-step solutions to your questions from an expert in the field. Your first 30 minutes with a Chegg tutor is free!