Statistics How To

Contingency Table: What is it used for?

Statistics Definitions > Contingency Table

Contingency Table: Overview


Contingency tables (also called crosstabs or two-way tables) are used in statistics to summarize the relationship between several categorical variables. A contingency table is a special type of frequency distribution table, where two variables are shown simultaneously.

For example, a researcher might be investigating the relationship between AIDS and sexual preference. The two variables would be AIDS and SEXUAL PREFERENCE. The question is “Is there a significant relationship between AIDS and sexual preference?”. A chi-square test could then be run on the table to determine if there is a relationship between the two variables.

The following contingency table shows exposure to a potential source of food-borne illness (in this case, ice-cream). From the table, you can see that 13 people in a case study ate ice cream; 17 people did not:

contingency table

Image: Michigan Dept. of Agriculture


In the above image, there’s an Odds Ratio calculation. For more info, see: What is the Odds Ratio?

Chi-Square Tests

A chi2 test can be conducted on contingency tables to test whether or not a relationship exists between variables. These effects are defined as relationships between rows and columns. The chi2 test:

contingency table

Where “O” is the Observed value, “E” is the expected value and “i” is the “ith” position in the table. The following picture shows what your contingency table might look like with your data, plus the results from running a chi2 test on your data. A small chi2 value means that there is little relationship between the categorical variables. A large chi2 value means that there is a definite correlation between the two variables. As there is some pretty strong evidence that sexual orientation is linked to a higher risk of contracting AIDS, it’s no surprise that the chi2 value is rather high:

contingency
However, the note under the results states that “4 cells (66.7%) have expected count less than 5.” Generally, if this is over 25%, the result could be due to chance alone. Therefore, the results from this particular test are not statistically significant.

Contingency Table in Excel

Contingency tables are notoriously labor-intensive to produce and involve computing the expected frequency for each cell. The procedure is further complicated by the fact that you may have to make a correction for continuity if the expected cell frequency is below 5 (the correction for continuity for 2 x 2 tables is called the Yates correction). Many popular programs have the capability to make contingency tables, including Microsoft Excel (note that even in Excel, the process is quite complicated, involving the creation of pivot tables).

A contingency table in Excel is created in Excel with the Pivot Table tool. Watch this two part video on how to create one in Excel 2013:

Image credit: Missouri State University

------------------------------------------------------------------------------

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!
Contingency Table: What is it used for? was last modified: October 15th, 2017 by Stephanie Glen

2 thoughts on “Contingency Table: What is it used for?

  1. Bryn Greer-Wootten

    In the example the % of expected frequencies less than 5 is 67%. Normally if this % is greater than 25%
    we would say that the chi-square value as computed could be due to chance alone and would disregard the result in terms of statistical significance level.