Regression Analysis > Hausman Test

You may want to read this article first: What is an endogenous variable?

## What is the Hausman Test?

The Hausman Test (also called the Hausman specification test) detects endogenous regressors (predictor variables) in a regression model. Endogenous variables have values that are determined by other variables in the system. Having endogenous regressors in a model will cause ordinary least squares estimators to fail, as one of the assumptions of OLS is that there is no correlation between an predictor variable and the error term. Instrumental variables estimators can be used as an alternative in this case. However, before you can decide on the best regression method, you first have to figure out if your predictor variables are endogenous. This is what the Hausman test will do.

This test is also called the Durbin–Wu–Hausman (DWH) test or the augmented regression test for endogeneity.

## Use in Panel Data Analysis

The Hausman test is sometimes described as a test for model misspecification. In **panel data analysis** (the analysis of data over time), the Hausman test can help you to choose between fixed effects model or a random effects model. The null hypothesis is that the preferred model is random effects; The alternate hypothesis is that the model is fixed effects. Essentially, the tests looks to see if there is a correlation between the unique errors and the regressors in the model. The null hypothesis is that there is no correlation between the two.

## Test Results

Interpreting the result from a Hausman test is fairly straightforward: if the p-value is small (less than 0.05), reject the null hypothesis. The problem comes with the fact that many versions of the test — with different hypothesis and possible conclusions — exist. In fact, some of the available tests suggest* “…opposite conclusions about the null hypothesis”* (Chmelarova, 2007). **Check your software and make sure you know which null hypothesis you are actually accepting or rejecting. **

One of the more common forms of the test is comparing estimators. For example, in this Stata example, you’re comparing against . The null hypothesis is that the estimator is an efficient (and consistent) estimator of the true population parameters.

A slightly different interpretation of the test can be see in this r example, the null hypothesis is that the errors are correlated with the regressors, with the null hypothesis being that they are not. This is testing for fixed effects (correlated errors) vs. random effects for panel data.

**References**:

Chmelarova,V. (2007). The Hausman Test, and Some Alternatives, with Heteroskedastic Data. Louisiana State University and Agricultural & Mechanical College, 2007. Retrieved 1/6/2007 from here (http://etd.lsu.edu/docs/available/etd-01242007-165928/unrestricted/Chmelarova_dis.pdf).

Hausman, J. A. 1978. Specification tests in econometrics. Econometrica 46: 1251–1271.

If you prefer an online interactive environment to learn R and statistics, this *free R Tutorial by Datacamp* is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try *this Statistics with R track*.

*Facebook page*and I'll do my best to help!