Statistics How To

Dummy Variables

Types of Variable > Dummy Variables

dummy variables

Regression analysis.

What are Dummy Variables?

Dummy variables are used in regression analysis. As implied by the name, these variables are artificial attributes used with two or more categories or levels. It’s used when you want to work with categorical variables which have no quantifiable relationship with each other.

For example, race can be categorized by Caucasian, African American, Asian, Hispanic, Other. If you assign the numbers 1-5 for these categories when performing regression analysis, the results would make no sense at all (is the “Other” category in any way 5 times the “Caucasian” category?). However, if you create a variable called Caucasian and assign the dummy variable 1 to mean “is Caucasian” and 0 to mean “is not Caucasian” then you can start to see how dummy variables are useful.

Coding Categorical variables with multiple levels

If you have a categorical variable with more than two levels (groups or levels are different groups in the same independent variable), multiple dummy variables need to be created. In the above example, the categorical variable “Race” has five levels (Caucasian, African American, Asian, Hispanic, Other). The formula k-1 is used to decide how many dummy variables to code, where “k” is the number of levels. In other words, only four of these five levels are coded with dummy variables. Which variable should you leave out? It’s usually the largest group to which all the others will be compared. In this example, let’s assume it’s some sort of data for Mexico City, Mexico. the largest group would be Hispanic and that would be the level left out. Ultimately, which variable is not coded with a dummy variable is up to you, the researcher and which variable you are comparing the others to.

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!
Dummy Variables was last modified: October 15th, 2017 by Andale