Statistics Definitions > Contingency Table

## Contingency Table: Overview

Contingency tables (also called crosstabs or two-way tables) are used in statistics to summarize the relationship between several categorical variables. A contingency table is a special type of **frequency distribution table**, where two variables are shown simultaneously.

For example, a researcher might be investigating the relationship between AIDS and sexual preference. The two variables would be AIDS and SEXUAL PREFERENCE. The question is “Is there a significant relationship between AIDS and sexual preference?”. A chi-square test could then be run on the table to determine if there is a relationship between the two variables.

The following contingency table shows exposure to a potential source of food-borne illness (in this case, ice-cream). From the table, you can see that 13 people in a case study ate ice cream; 17 people did not:

In the above image, there’s an Odds Ratio calculation. For more info, see: What is the Odds Ratio?

## Chi-Square Tests

**A chi ^{2} test** can be conducted on contingency tables to test whether or not a relationship exists between variables. These effects are defined as relationships between rows and columns. The chi

^{2}test:

Where “O” is the Observed value, “E” is the expected value and “i” is the “ith” position in the table. The following picture shows what your contingency table might look like with your data, plus the results from running a chi^{2} test on your data. A **small chi ^{2} value **means that there is little relationship between the categorical variables. A

**large chi**means that there is a definite correlation between the two variables. As there is some pretty strong evidence that sexual orientation is linked to a higher risk of contracting AIDS, it’s no surprise that the chi

^{2}value^{2}value is rather high:

However, the note under the results states that “4 cells (66.7%) have expected count less than 5.” Generally, if this is over 25%, the result could be due to chance alone. Therefore, the results from this particular test are **not statistically significant**.

## Contingency Table in Excel

Contingency tables are notoriously **labor-intensive** to produce and involve computing the expected frequency for each cell. The procedure is further complicated by the fact that you may have to make a correction for continuity if the expected cell frequency is below 5 (the correction for continuity for 2 x 2 tables is called the Yates correction). Many popular programs have the capability to make contingency tables, including Microsoft Excel (note that even in Excel, the process is quite complicated, involving the creation of pivot tables).

A contingency table in Excel is created in Excel with the Pivot Table tool. Watch this two part video on how to create one in Excel 2013:

Image credit: Missouri State University

If you prefer an online interactive environment to learn R and statistics, this *free R Tutorial by Datacamp* is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try *this Statistics with R track*.

*Facebook page*and I'll do my best to help!

In the example the % of expected frequencies less than 5 is 67%. Normally if this % is greater than 25%

we would say that the chi-square value as computed could be due to chance alone and would disregard the result in terms of statistical significance level.

Thank you for pointing that out. I made an addendum to the article.