Statistics Definitions > Classical Test Theory
What is Classical Test Theory?
Classical Test Theory (CTT), sometimes called the true score model, is the mathematics behind creating and answering tests and measurement scales. The goal of CTT is to improve tests, particularly the reliability and validity of tests.
Reliability implies consistency: if you take the ACT five times, you should get roughly the same results every time. A test is valid if it measures what it’s supposed to.
It’s called “classic” because Item Response Theory is a more modern framework.
Classical Test Theory assumes that each person has an innate true score. It can be summed up with an equation:
- X is an observed score,
- T is the true score,
- E is random error.
For example, let’s assume you know exactly 70% of all the material covered in a statistics course. This is your true score (T); A perfect end-of-semester test (which doesn’t exist) should ideally reflect this true score. In reality, you’re likely to score around 65% to 75%. The 5% discrepancy from your true score is the error (E).
The errors are assumed to be normally distributed with a mean of zero; Hypothetically, if you took the test an infinite number of times, your observed score should equal your true score.
Statistics Used in Classical Test Theory
Classical test theory is a collection of many statistics, including the average score, item difficulty, and the test’s reliability.
Correlation: shows how two variables X and Y are related to each other. Different measures are used for different test types. For example, a dichotomously scored test (e.g. yes/no answers) would be correlated with point-biserial correlation while a polytomously scored test (one with multiple answers) would be scored with the Pearson Correlation Coefficient.
Covariance is a measure of how much two random variables vary together. It’s similar to variance, but where variance tells you how a single variable varies, co variance tells you how two variables vary together.
3. Discrimination Index
Discrimination Index: the ability of the test to discriminate between different levels of learning or other concept of interest. A high discrimination index indicates the test is able to differentiate between levels.
4. Item difficulty
Item difficulty: a measure of individual test question difficulty. It is the proportion of test takers who answered correctly out of the total number of test takers. For example, an item difficulty score of 89/100 means that out of 100 people, 89 answered correctly.
5. Reliability Coefficient
Reliability coefficient — a measure of how well the test measures achievement. Several methods exist for calculating the coefficient include test-retest, parallel or alternate-form and internal analysis. Rules of thumb for preferred levels of the coefficient:
For high stakes tests (e.g. college admissions), > 0.85.
For low stakes tests (e.g. classroom assessment), > 0.70.
6. Sample Variance / Standard Deviation
The sample variance and sample standard deviation are measures of how spread out the scores are.
7. Standard Error of Measurement
Standard Error of Measurement (SEm): a measure of how much measured test scores are spread around a “true” score.
Crocker, L., & Algina, J. (1986). Introduction to classical &
modern test theory. Orlando, FL: Holt, Rinehart and Winston
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum Associates.