Types of Variables > Grouping Variable

## What is a Grouping Variable?

A **grouping variable** (also called a *coding variable*, *group variable* or *by variable*) sorts data within data files into categories or groups. It tells a computer system how you’ve sorted data into groups. Grouping variables can be:

- Categorical variables: a category like “Male” or “Female” and “Control Group” or “Experimental Group.”
- Binary (Logical) Variable: a binary digit, 0, or 1.
- Numeric Variable: a number, like 1, 2, or 3.

Usually, you can name a group anything, as long as it makes sense to you (and that you tell the software about your naming convention). For example, if you have an experimental group and a control group you could name the groups:

- 0 1(binary).
- EXPERIMENTAL CONTROL(categorical). You could also name them EXPER. and CONTR. or E and C.
- 1,2.(numerical).

You could even categorize your groups as X And Y, although it makes more sense to keep the category names meaningful. If you look back at your data in 10 years, you’ll be glad you created names that jog your memory.

**Note**: Sometimes, an author might use “grouping variable” as a synonym for the independent variable in tests like MANOVA.

## Use in Software

Grouping variables are typically used in software. Each software has its own quirks and requirements when it comes to naming variables. For example:

- In
**Statistica**, usually the group is identified by a number (i.e. Group 1, 2 or 3) or by a categorical label, like MALE or FEMALE. These values are called*codes*and you can specify up to 1,000 of them. - In
**SPSS**, grouping variables are defined on the worksheet and specified within a test window (for example, the Independent Samples T Test or Tests for Several Independent Samples window). For example, let’s say your worksheet has 1 for male and 2 for female. In the T Test window, move the independent variable down to the Grouping Variable box. Click “Define Groups” and enter your labels (i.e. 1,2). - In
**MATLAB**, you’ve got many options in addition to numeric or categorical variables. For example, character arrays can store multiple characters, while cell arrays can store multiple strings in the same variable (“cell array of strings”).

**Need help with a homework or test question?** Chegg offers 30 minutes of free tutoring, so you can try them out before committing to a subscription. Click here for more details.

If you prefer an **online interactive environment** to learn R and statistics, this *free R Tutorial by Datacamp* is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try *this Statistics with R track*.

*Facebook page*.