Statistics How To

Clustered Standard Errors: Definition

Statistics Definitions > > Clustered Standard Errors

You may want to read this article first: What is the Standard Error of a Sample?

What are Clustered Standard Errors?

Clustered Standard Errors(CSEs) happen when some observations in a data set are related to each other. This correlation occurs when an individual trait, like ability or socioeconomic background, is identical or similar for groups of observations within clusters. Panel data (multi-dimensional data collected over time) is usually the type of data associated with CSEs.

For example, let’s say you wanted to know if class size affects SAT scores. Specifically, you think that smaller class size leads to better SAT scores. You collect panel data for dozens of classes in dozens of schools. As this is panel data, you almost certainly have clustering. Teachers might be more efficient in some classes than other classes, students may be clustered by ability (e.g. special education classes), or some schools might have better access to computers than others. According to Cameron and Miller, this clustering will lead to:

Incorrect standard errors violate of the the assumption of independence required by many estimation methods and statistical tests and can lead to Type I and Type II errors.

Adjusting for Clustered Standard Errors

Accurate standard errors are a fundamental component of statistical inference. Therefore, If you have CSEs in your data (which in turn produce inaccurate SEs), you should make adjustments for the clustering before running any further analysis on the data.

Hand calculations for clustered standard errors are somewhat complicated (compared to your average statistical formula). For example, this snippet from The American Economic Review gives the variance formula for the calculation of the clustered standard errors:
clustered standard errors

It’s usually not necessary to perform these adjustments by hand as most statistical software packages like Stata and SPSS have options for clustering. When you specify clustering, the software will automatically adjust for CSEs.

One way to control for Clustered Standard Errors is to specify a model. For example, you could specify a random coefficient model or a hierarchical model. However, accuracy of any calculated SEs completely relies upon you specifying the correct model for within-cluster error correlation. A second option is Cluster-Robust Inference, which does not require you to specify a model. It does, however, have the assumption that the number of clusters approaches infinity (Ibragimov & Muller).

References
Cameron and Miller. A Practitioner’s Guide to Cluster-Robust Inference
Ibragimov, R., & Muller, U. Inference with Few Heterogenous Clusters.
Primo, D. the practical researcher. Estimating the Impact of State Policies and
Institutions with Mixed-Level Data

------------------------------------------------------------------------------

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!
Clustered Standard Errors: Definition was last modified: October 21st, 2017 by Stephanie Glen