Types of Variables > Categorical Variable

As the name suggests, categorical variables are those variables that fall into a particular

**category**. Hair color, gender, college major, college attended, political affiliation, disability, or sexual orientation are all categories that could have lists of categorical variables. Usually, the variables take on one of a number of fixed variables in a set.

For example:

- The category “hair color” could contain the categorical variables “black,” “brown,” “blonde,” and “red.”
- The category “gender” could contain the categorical variables “Male”, “Female”, or “Other.”

Note that “hair color” and “gender” are the categories and are not categorical variables themselves. A categorical variable is a value that variables in a study take; the value varies from person to person. Let’s say you survey people and ask them to tell you their hair color. They would respond with a categorical variable of black, brown, blond, or red. They wouldn’t respond “hair color.”

### Is there an order to categorical variables?

There is no order to categorical variables; in other words, they aren’t ranked from highest to lowest or lowest to highest. For example, there is no intrinsic order to the categories of male and female. If there *is* some kind of order, then those variables would be **ordinal variables** and not categorical variables. For example, you could categorize house prices by cheap, moderate and expensive. Although these are categories, there is a clear order (with cheap on the bottom and expensive on top).

## Categorical Variables vs Quantitative Variables

Quantitative variables are the “x” and “y” of stats: they are variables that can be added or subtracted, multiplied or divided. Categorical variables on the other hand are **descriptions **about those numbers.

### Examples of categorical variables:

- Brand of toothpaste (Colgate, Aquafresh…)
- College major (English, Math…)
- Telephone company (Bell South, AT&T…)
- Checking account location (Jacksonville, New York City…)
- School attended (Lee High, Wescott High…)

### Examples of quantitative variables:

- Number of toothpaste tubes used per year.
- G.P.A. for college major.
- Bytes of data uploaded on your phone.
- Checking account balance.
- Average number of students in a class.

### Assigning quantities to a categorical variable

It is possible to assign a **quantity** to a categorical variable. For example, you might record hair colors as 1:black, 2:brown, 3: blond or 4:red. This makes it easier to analyze and manipulate your data, especially in spreadsheet form. This doesn’t change the fact that the variable is still categorical. **It doesn’t become a quantitative variable because you assigned it a number. **

### Grey areas

Sometimes it’s very difficult to decide whether a variable is **a categorical variable or quantitative variable.**

For example, my area code is 904. Although it’s technically possible to perform math on area codes (for example 904+518), adding them doesn’t make much sense. The number 904 is actually a categorical variable, representing customers in North East Florida. Zip codes (i.e. 90214) are also categorical.

Another example: most of us have seen customer service surveys where the rating varies from 0(poor) to 5(excellent). These types of rating systems are so commonplace that the line between quantitative variable and categorical variable becomes blurred. “Perfect 10” anyone?

## Graphs that display Categorical Variables

Pie charts and bar charts compare one categorical variable against others. Count the number of items in a category. For example: How many students are in a classroom? The results can be displayed on a pie chart or on a bar graph.

### Contingency tables

A more complicated graph is a contingency table.

When you want to see how two categorical variables are related, you can place those variables in a **contingency table**. The table is a display of counts, and sometimes percentages, of individuals who fall into a category for two or more quantitative variables. A contingency table displays all categories and all quantitative data to make it easier to find possible relationships between the data.

## Categorical Data Condition: Overview

The categorical data condition is a check to make sure data are in counts or percentages before you make a pie or bar chart.

When you try to decide what type of graph you should make to display data, it’s important to check if you have categorical variables or quantitative variables. For example, **pie charts** and **bar graphs **are used to display data that is in **categories**.

## Why the Categorical Data Condition is Important

If you don’t check the categorical data condition, your pie chart or bar graph **won’t make any sense**. For example, if you have x,y data (numerical data) and you try to make a pie chart, your chart will look like this:

As you can probably tell by looking at the chart, **it doesn’t make any sense to make a pie chart from two sets of numbers**. A better choice would be a scatter plot. Here’s what the same set of numbers look like on a scatter plot:

## The Quantitative Data Condition

In addition to the categorical data condition, there is a **quantitative data condition**. The quantitative data condition is a check to make sure you have quantitative (numerical) data before you make a graph.

Check out our Youtube channel. You’ll find more help for statistics!

------------------------------------------------------------------------------If you prefer an online interactive environment to learn R and statistics, this *free R Tutorial by Datacamp* is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try *this Statistics with R track*.

*Facebook page*and I'll do my best to help!