Statistics Definitions > Assumption of Independence

## What is the Assumption of Independence?

The assumption of independence is used for T Tests, in ANOVA tests, and in several other statistical tests. It’s essential to getting results from your sample that reflect what you would find in a population. Even the smallest dependence in your data can turn into heavily biased results (which may be undetectable) if you violate this assumption.

A **dependence** is a connection between your data. For example, how much you earn *depends* upon how many hours you work. **Independence** means there isn’t a connection. For example, how much you earn isn’t connected to what you ate for breakfast. The assumption of independence means that your data isn’t connected in any way (at least, in ways that you haven’t accounted for in your model).

There are actually two assumptions:

- The
**observations between groups**should be independent, which basically means the groups are made up of different people. You don’t want one person appearing twice in two different groups as it could skew your results. - The observations
**within each group**must be independent. If two or more data points in one group are connected in some way, this could also skew your data. For example, let’s say you were taking a snapshot of how many donuts people ate, and you took snapshots every morning at 9,10, and 11 a.m.. You might conclude that office workers eat 25% of their daily calories from donuts. However, you made the mistake of timing the snapshots too closely together in the morning when people were more likely to bring bags of donuts in to share (making them*dependent*). If you had taken your measurements at 7, noon and 4 p.m., this would probably have made your measurements independent.

## What happens if you violate the Assumption of Independence?

In simple terms, if you violate the assumption of independence, you run the risk that all of your results will be wrong.

## How do I Avoid Violating the Assumption?

Unfortunately, looking at your data and trying to see if you have independence or not is usually difficult or impossible. The key to avoiding violating the assumption of independence is to make sure your data is independent *while you are collecting it*. If you aren’t an expert in your field, this can be challenging. However, you may want to look at previous research in your area and see how the data was collected.

If you prefer an online interactive environment to learn R and statistics, this *free R Tutorial by Datacamp* is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try *this Statistics with R track*.

*Facebook page*and I'll do my best to help!