Structural Equations Models > Latent Class Analysis / Modeling

## What is Latent Class Analysis?

Latent Class Analysis (LCA) is a way to uncover hidden groupings in data. More specifically, it’s a way to to group subjects from multivariate data into “latent classes” —

**groups or subgroups with similar, unobservable, membership**.

**Latent**implies that the analysis is based on an error-free latent variable (Collins & Lanza, 2013).**Classes**are groups formed by uncovering hidden (*latent*) patterns in data.

## Latent Variables and Classes

A **latent variable** or “hidden” variable (also called a *construct*) is a variable that isn’t directly measurable or observable. What happens instead is that the observed variables in your data act as indicators to measure the latent variables.

**A variety of reasons can cause latent, non-observable, variables.** People may not want to be honest, or they may not be aware themselves of any important factors. For example, a person’s level of neurosis, conscientiousness or openness are all latent variables; They are almost impossible to measure directly. Examples of **latent classes**, where participants can form groups based on these hidden variables, include:

- People based on how much they drink, what eating disorders they have, or what neuroses they suffer from.
- Patients based on phobia types.
- Risk factors for teenagers. For example, cocaine use, glue sniffing, drunk driving.

Latent Class Analysis **uncovers hidden patterns of association that can exist between observations**. Conditional probability patterns, indicating the chance variables will take on certain values, create the basis for latent class formation.

LCA works with binary data, Likert scale, nominal variables or ordered categorical variables. It doesn’t work with ordinal variables.

## LCA vs. Cluster Analysis and Factor Analysis

Latent Class Analysis is similar to cluster analysis. Observed data is analyzed, connections are found, and the data is grouped into clusters.

LCA is also similar to Factor Analysis; The main difference is that Factor Analysis is to do with correlations between variables, while LCA is concerned with the structure of groups (or *cases*). Another difference is that LCA includes discrete latent categorical variables that have a multinomial distribution. Factor analysis uses continuous latent variables with normal distribute. Ruscio and Ruscio (2008) outline the differences between the two:

**Categorical latent variables (LCA)**: “…qualitative differences exist between groups of people or objects”.**Continuous latent variables (Factor Analysis)**: “…people or objects differ qualitatively along one or more continua.”

Essentially, Factor Analysis has been around for much longer than Latent Class Analysis. The need for LCA grew out of the social sciences, where many variables are not found on a continuum. Allan McCutcheon gives the example of a typology, a specific group of variables. Theoretically, any combination of these variables *could *happen, but only a few of them *do *happen. LCA gives the social scientist a way to limit these typologies to the few combinations of interest.

## Types of Latent Class Analysis

LCA falls into three broad categories:

**Cluster models:**identifies*clusters*that group people together, based on similar behaviors, characteristics, interests, or values.*K*-category latent variables represent the clusters. The number and size of the classes are not known beforehand.**Factor models:**identifies*factors*that group together variables with a common source of variation.**Regression models:**predict a dependent variable as a function of predictors.

## Software for Latent Class Analysis

**Many popular statistical software programs, like IBM SPSS, do not have the capability for running LCA.** At the time of writing, IBM does plan to add LCA to SPSS in the future. Programs that do support LCA include R and SAS. Other, less well-known programs (some of which, like MLLSA, are free) include:

## Latent Transition Analysis

Latent Transition Analysis is an extension of Latent Class Analysis for longitudinal data (as opposed to the cross-sectional data used in LCA). LTA uncovers movement between the subgroups over time. You can only use LTA if you have longitudinal data (e.g. data from a retrospective longitudinal study). The term *Latent Class Model* is sometimes used as an umbrella term to describe both LCA and LTA.

## References

Collins, L. & Lanza, S. (2013). Latent Class and Latent Transition Analysis: With Applications in the Social, Behavioral, and Health Sciences. John Wiley & Sons.

McCutcheon, A. (1987). Latent Class Analysis, Issue 64. SAGE.

Ruscio, J. and Ruscio, A. (2008). Advancing psychological science through the study of latent structure. In Current directions in psychological science. 17:203-207.

If you prefer an online interactive environment to learn R and statistics, this *free R Tutorial by Datacamp* is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try *this Statistics with R track*.

*Facebook page*and I'll do my best to help!