## What Is Hotelling’s t-squared?

**Hotelling’s T-Squared, sometimes written as T**, is the multivariate counterpart of the T-test [1]. “Multivariate” means that you have data for more than one parameter for each sample. For example, let’s say you wanted to compare how well two different sets of students performed in school. You could compare univariate data (e.g. mean test scores) with a t-test. Or, you could use Hotelling’s T-squared to compare multivariate data (e.g. the mutivariate mean of test scores, GPA, and class grades). Hotelling’s T-Squared is based on Hotelling’s T

^{2}^{2}distribution and forms the basis for various multivariate control charts. It can also describe the Mahalanobis distance between two populations and can also be used to identify outliers or nonconformities in a data set. Formally, Hotelling’s T-squared distribution is defined as follows [2]: Suppose that a vector

**d**is normally distributed with a mean of zero and unit covariance matrix N

_{p}(0, I), and M is an

*m**

*p*matrix with a Wishart distribution with unit scale matrix and

*n*degrees of freedom W

_{p}(I, m). Then, m

**d**

^{T}M

^{-1}

**d**has a two-parameter Hotelling distribution T

^{2}(p,m). Where

- A
**covariance matrix**shows the correlation between each pair of variables in a multivariate dataset. Covariance is a measure of how much two variables vary together. - A
**unit scale matrix**is a covariance matrix where all variances are equal to 1, which means that all the variables are equally spread out around their means.

## Test Versions

Two versions of the test exist with the following null hypotheses:**One sample**: The multivariate vector means for a group equals a hypothetical vector of means.**Two sample**: The multivariate vector of means for two groups are equal.

## Two-sample Hotelling’s t-squared

If you know how to run a two sample t-test, then you know how to run a two-sample Hotelling’s T-squared.**The basic steps are the same,**although you’ll use a different formula to calculate the t-squared value and you’ll use a different table (the F-table) to find the critical value. Hotelling’s T-squared has

**several advantages**over the t-test [3]:

- The Type I error rate is well controlled,
- The relationship between multiple variables is taken into account,
- It can generate an overall conclusion even if multiple (single) t-tests are inconsistent. While a t-test will tell you which variable differ between groups, Hotelling’s summarizes the between-group differences.

**test hypotheses**are:

**Null hypothesis**(H_{0}): the two samples are from populations with the same multivariate mean.**Alternate hypothesis**(H_{1}): the two samples are from populations with different multivariate means.

**Three major assumptions**are that the samples:

- …have underlying normal distributions.
- …are independent.
- …have equal variance-covariance matrices (for the two sample test only). Run Bartlett’s test to check this assumption.

^{2}is first transformed into an F-statistic:

**Where**:

- N
_{1}& N_{2}= sample sizes, - p = number of variables measured,
- N
_{1}+ N_{2}– p – 1 = degrees of freedom.

## Why Is Hotelling’s T-Squared Distribution Important?

Hotelling’s T^{2}is an important tool for identifying changes in means between multiple populations. By using linear combinations of variables, it allows us to compare multiple samples at once, instead of having to run separate tests for each sample. This makes it much easier and faster to identify any meaningful changes in means between populations over time or across different groups of people. Additionally, because it uses a chi-squared test statistic, it helps us determine whether or not these changes are statistically significant – something that traditional t-tests cannot do on their own.

## References

- Hotelling H (1931) The generalization of Student’s ratio. Ann Math Stat. 2(3):360–378.
- Weisstein, E. (2002). CRC Concise Encyclopedia of Mathematics. CRC Press.
- Fang, J. (2017). Handbook of Medical Statistics.