Types of Variables > Grouping Variable
What is a Grouping Variable?
A grouping variable (also called a coding variable, group variable or by variable) sorts data within data files into categories or groups. It tells a computer system how you’ve sorted data into groups. Grouping variables can be:
- Categorical variables: a category like “Male” or “Female” and “Control Group” or “Experimental Group.”
- Binary (Logical) Variable: a binary digit, 0, or 1.
- Numeric Variable: a number, like 1, 2, or 3.
Usually, you can name a group anything, as long as it makes sense to you (and that you tell the software about your naming convention). For example, if you have an experimental group and a control group you could name the groups:
- 0 1(binary).
- EXPERIMENTAL CONTROL(categorical). You could also name them EXPER. and CONTR. or E and C.
You could even categorize your groups as X And Y, although it makes more sense to keep the category names meaningful. If you look back at your data in 10 years, you’ll be glad you created names that jog your memory.
Note: Sometimes, an author might use “grouping variable” as a synonym for the independent variable in tests like MANOVA.
Use in Software
Grouping variables are typically used in software. Each software has its own quirks and requirements when it comes to naming variables. For example:
- In Statistica, usually the group is identified by a number (i.e. Group 1, 2 or 3) or by a categorical label, like MALE or FEMALE. These values are called codes and you can specify up to 1,000 of them.
- In SPSS, grouping variables are defined on the worksheet and specified within a test window (for example, the Independent Samples T Test or Tests for Several Independent Samples window). For example, let’s say your worksheet has 1 for male and 2 for female. In the T Test window, move the independent variable down to the Grouping Variable box. Click “Define Groups” and enter your labels (i.e. 1,2).
- In MATLAB, you’ve got many options in addition to numeric or categorical variables. For example, character arrays can store multiple characters, while cell arrays can store multiple strings in the same variable (“cell array of strings”).
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!