Reliability vs Validity
First of all, why do we even care about validity? Whenever we question or analyze a measuring tool (no matter what it is) — a ruler, assessment tool, quiz, survey, etc. we look at 1) its reliability and 2) its validity. The more reliable and valid a measure is, the more we can trust it. A classic example is a bathroom scale. For it to be reliable, it should give the exact weight every time. If on earth at the exact same location with no change in altitude, if we put say a dumbbell on a scale, if it gives us a different weight every time, then it’s not very reliable.
What would be an example of validity? If we used a bathroom scale (which measures weight) and its output numbers to measure say, temperature, then it is a completely invalid measuring tool for temperature. We should be using a thermometer for that.
See the post on Construct & Face Validity.
Let’s start with content validity. The term content validity assesses whether a test is representative of “all aspects” of the construct. To produce valid results, the content, such as survey questions, must cover all relevant parts of the subject or construct it is intended to measure. If some aspects are missing from the measurement, it is less valid. Let’s take the Barthel index as an example. It is intended to measure the independence level for basic ADLs. Notice how it asks questions based on each ADL comprehensively – bowel and bladder and toilet use for toileting, grooming, feeding, transfers, mobility, dressing, stairs and bathing. Now if the bathel index was intended to measure the level of independence for ADLs, but neglected to include questions for say, dressing. Then the content validity is threatened because it did not address dressing in its component parts in consideration when coming up with a final score for the broad construct of ADLs.
Assessments that aim to “cover” or measure a construct such as autism or depression should therefore have good content validity and include for example, all the symptoms described for depression in the latest DSM, compared to one that is based off an outdated DSM, such as version 1, for example, when the “construct” of depression may have been defined or thought of differently based on the research at the time.
To remember the term content validity, think “table of CONTENTS” which encompasses the entire book from beginning to end, such as a novel that tells the story from beginning to end. Or your occupational therapy textbook’s table of contents contains a map to all of the contents of the book. A table of contents should not miss some parts or it is not very useful.
Now let’s review criterion validity. This is often the case when new tests are developed or improved on, even if it’s a newer version of the same test — it should have high correlation of validity to existing or other tests that intend to measure the same construct.
So if I came up with my own version of the Barthel Index to measure Index, let’s call it the OT Dude Index, it should have a high degree of correlation with say, the Barthel Index, which is highly regarded as a reliable and valid tool for the level of independence for ADLs. If the OT Dude Index comes up with different scores or even opposite scores, you, as a researcher or clinician would question the efficacy of probably my assessment – maybe it included a small sample size, measured something differently, or whatever the reason. That is why criterion validity is also important because it may have construct validity, face validity, and even content validity (such as the OT Dude index asking questions about all the ADLs), but one major component of validity is criterion validity.
To remember the term criterion validity, think of the word criterion as critters, like many bugs. There are usually many critters (plural) in nature and not just one critter. So you look not only at 1 criter in the wild, but many of the critters and compare them to each other when doing research. Hope this helps.