Statistics Definitions > Content Validity
What is Content Validity?
Content validity (also called logical or rational validity) refers to the extent to which a test or assessment instrument evaluates all aspects of a behavior, construct, or topic it is designed to measure. In other words, we’re asking if the test fully covers the subject matter:
- High content validity indicates that the test thoroughly spans the topic.
- Lower results suggest that the test fails to include relevant parts of the topic.
When developing a test or survey that assesses knowledge in a specific subject area, you’ll want to ask some important questions about the scope of assessment. For example:
- If you’re creating an educational assessment, does the test cover all the topics you want it to? Let’s say you want to design a test to evaluate high school students’ understanding of math . Content validity evaluates whether my test adequately covers the appropriate material for that subject area and level of expertise. In other words, does my test include all relevant aspects of the content area, or are there missing concepts?
- If you’re creating a psychological scale, does the instrument include all the dimensions related to the psychological construct? For example, let’s say the construct is self-esteem. Does the scale include all dimensions such as self-worth, self-acceptance and self-respect? Or are there missing dimensions (in this example, a missing dimension might be self-efficacy).
Content Validity Example
More formally, that “something” you are trying to measure is called a construct. A construct can be (almost) anything. Simple constructs include height, weight and IQ. More complicated constructs include: ability to perform well at a certain job; competency with wide-ranging subject areas like physics or U.S. history, and ability to evaluate other people’s psychiatric condition. Evaluating content validity is essential in the many real-life scenarios to ensure that tests comprehensively assess the full range of knowledge and aspects of the psychological constructs. Examples include:
- Driver license tests.
- Medical licensing exams.
- Standardized tests such as the SAT and GRE.
- Scales used in psychology to assess anger management.
Examples of measurements that are content valid:
- Height (construct) measured in centimeters (measurement).
- AP Physics knowledge (construct) measured by the AP exam (measurement).
Examples of measurements that have debatable content validity:
- The Bar Exam is not a good measure of ability to practice law (see here).
- IQ tests are not a good way to measure intelligence (see here and here).
How to Measure Content Validity
Measuring content validity involves assessing individual questions to determine if they accurately target the characteristics that the instrument is meant to cover. This process evaluates the compatibility of the test with its objectives and the theoretical properties of the construct. The goal is to systematically analyze each item’s contribution, making sure no aspect is overlooked. A common way to do this procedure is with factor analysis, which is used to identify and measure latent (hidden) variables. There are two main types of factor analysis: exploratory factor analysis (EFA) and confirmatory factor analysis (CFA):
- EFA is commonly used when there isn’t a well-defined theoretical model for latent variables. It helps to find patterns in the data that might suggest latent variables are present.
- CFA is often used when there is a clear theoretical model for latent variables. It assesses whether the data fits the specified theoretical model. For example, CFA can be used to determine if a set of test scores conforms to a theoretical model of intelligence consisting of mathematical, spatial, and verbal ability.
Factor analysis can be used identify the number of underlying dimensions covered by the test items. The procedure helps determine if the items collectively measure an adequate number and variety of factors. If the results of the factor analysis show that the measurement instrument lacks coverage of important dimensions, you should improve the instrument then re-run factor analysis.
Another way to assess content validity is with the content validity ratio [1]. With this approach, subject matter experts determine whether an item is essential to assess the knowledge or skill it represents. The items are labeled “essential,” “useful but not necessary,” or “not necessary.” The method is a form of inter-rater reliability regarding the item’s importance. Ideally, you want all or most experts to agree that each item is “essential.” The content validity ratio (CVR) can quantify this agreement:
CVR = (Ne – 1) / (N – 1)
Where:
- Ne = Number of panel members giving “essential” ratings for an item
- N = Number of panel members.
This formula results in values ranging from -1 (perfect disagreement) to +1 ( perfect agreement) for each question. Values above 0 suggest agreement by more than half of the experts. Lawshe [1] suggested the transformation (from proportion to content validity ratio) was worthwhile as it is a relatively easy way to see whether the level of agreement among panel members was greater than 50%. However, it is important to consider whether the observed agreement could occur just by chance [2]. Critical values for the CVR can help determine the significance. These critical values depend on the number of experts.
Panel Size | Ncritical | Proportion Agreeing Essential | CVRcritical |
---|---|---|---|
5 | 5 | 1 | 1.00 |
6 | 6 | 1 | 1.00 |
7 | 7 | 1 | 1.00 |
8 | 7 | 0.875 | 0.750 |
9 | 8 | 0.889 | 0.778 |
10 | 9 | 0.900 | 0.800 |
11 | 9 | 0.818 | 0.636 |
12 | 10 | 0.833 | 0.667 |
13 | 10 | 0.769 | 0.538 |
14 | 11 | 0.786 | 0.571 |
15 | 12 | 0.800 | 0.600 |
16 | 12 | 0.750 | 0.500 |
17 | 13 | 0.765 | 0.529 |
18 | 13 | 0.722 | 0.444 |
19 | 14 | 0.737 | 0.474 |
20 | 15 | 0.750 | 0.500 |
21 | 15 | 0.714 | 0.429 |
22 | 16 | 0.727 | 0.455 |
23 | 16 | 0.696 | 0.391 |
24 | 17 | 0.708 | 0.417 |
25 | 18 | 0.720 | 0.440 |
26 | 18 | 0.692 | 0.385 |
27 | 19 | 0.704 | 0.407 |
28 | 19 | 0.679 | 0.357 |
29 | 20 | 0.690 | 0.379 |
30 | 20 | 0.667 | 0.333 |
31 | 21 | 0.677 | 0.355 |
32 | 22 | 0.688 | 0.375 |
33 | 22 | 0.667 | 0.333 |
34 | 23 | 0.676 | 0.353 |
35 | 23 | 0.657 | 0.314 |
36 | 24 | 0.667 | 0.333 |
37 | 24 | 0.649 | 0.297 |
38 | 25 | 0.658 | 0.316 |
39 | 26 | 0.667 | 0.333 |
40 | 26 | 0.650 | 0.300 |
The content validity index (CVI), calculated by finding the mean CVRs of all test items, provides an overall measure of the test’s content validity. For example, let’s say you have a test with 10 items and you calculate the individual CVRs as follows:
Item | Ne | CVR |
---|---|---|
1 | 5 | 1.00 |
2 | 4 | 0.80 |
3 | 3 | 0.60 |
4 | 2 | 0.40 |
5 | 1 | 0.20 |
The CVI for this set of data is CVI = (1.00 + 0.80 + 0.60 + 0.40 + 0.20)/5 = 0.60. A value of 0.60 is okay, as it indicates 60% of experts agree that the test items are essential. Above 70% is generally considered “good,” but this largely depends on the field you’re working in. For example, medical testing may require higher CVIs than a job training assessment. Both CVI and CVR are important measures of content validity. But, CVI is generally thought of as more reliable for measuring content validity than CVR. That’s because CVI takes into account the number of experts who agree that an item is essential, and the overall content validity of the test. To summarize, CVR differentiates between necessary and unnecessary questions, although it does not identify any missing aspects.
Differences with Face Validity and Internal Consistency
Content validity is also similar to face validity. However, they both use different approaches to check for validity. Face validity is an informal way to check for validity; anyone could take a test at its “face value” and say it looks good. Content validity uses a more formal, statistics-based approach, usually with experts in the field. These experts judge the questions on how well they cover the material. Content validity and internal consistency are similar, but they are not the same thing. Content validity is how well an instrument (i.e. a test or questionnaire) measures a theoretical construct. Internal consistency measures how well some test items or questions measure particular characteristics or variables in the model. For example, you might have a ten-question customer satisfaction survey with three questions that test for “overall satisfaction with phone service.” Testing those three questions for satisfaction with phone service is an example of checking for internal consistency; taking the whole survey and making sure it measures “customer satisfaction” would be an example of content validity.
References
- Lawshe, CH, A Quantitative Approach to Content Validity, Personnel Psychology, 1975, 28, 563-575.
- Scally, A. Critical Values for Lawshe’s Content Validity Ratio. December 2013 Measurement and Evaluation in Counseling and Development 47(1):79-86 DOI:10.1177/0748175613513808