Reliability vs Validity
First of all, why do we even care about validity? Whenever we question or analyze a measuring tool (no matter what it is) — a ruler, assessment tool, quiz, survey, etc. we look at 1) its reliability and 2) its validity. The more reliable and valid a measure is, the more we can trust it. A classic example is a bathroom scale. For it to be reliable, it should give the exact weight every time. If on earth at the exact same location with no change in altitude, if we put say a dumbbell on a scale, if it gives us a different weight every time, then it’s not very reliable.
What would be an example of validity? If we used a bathroom scale (which measures weight) and its output numbers to measure say, temperature, then it is a completely invalid measuring tool for temperature. We should be using a thermometer for that.
While validity can be easy to understand, it can be divided into many types of validity to determine how valid a measure really is. The overarching and broadest type of validity is construct validity. A construct is central to establishing the overall validity of a research method. Something like gravity, intelligence, or depression can be thought of as a construct because it is complex and involves multidimensional variables, and is not directly observed.
So constructs can’t be directly observed but can be measured by observing other indicators that are associated with it. When you add the word validity to it, then construct validity means does the test measure the “construct” that it’s intended to measure?
Let’s take the construct intelligence. You can’t look at someone and say they are intelligent. If you develop a questionnaire to determine intelligence, the term construct validity asks if the questionnaire really measures the construct of depression. To remember the term, “construct” validity, think of construct as construction paper. When you make an art project out of construction paper, you are creating something with construction paper into something new, like a new piece of art that is more “abstract” in concept that represents “depression”.
The next simplest type of validity is “face validity”. Face validity is whether something “looks like” or “appears to” measures what it is intended to measure.
Notice the difference between construct and face validity. Construct validity is if something “really” measures the construct, say of depression. Face validity is “appears to”, based on the face or surface to measure say, depression.
Researchers don’t consider face validity as a strong predictor because it is “superficial” and also subjective (and not objective – which is believed to be more important for some types of research). To remember face validity, imagine someone who is very superficial and looks at someone’s appearance (or face) as a judge of their personality before even getting to know them. However, we all know looks don’t matter and it’s on the inside that counts.
However, in many textbooks and websites, you’ll often read the authors not really go into face validity and move onto other types of validity such as content or criterion validity. So why bother with face validity in the first place?
That’s because face validity and face value are very important. Sometimes, if a measure does not have face validity, you should question whether it is an appropriate measure.
Let’s use the scale example again.
So you take two people, one is a baby, and another is an adult. You weigh them each separately. Construct validity tells us that the scale reads 0 pounds when nothing is on the scale, but as an object such as a person is placed, the number output changes and is indeed measuring weight…somehow, but that’s all we know so far. This is similar to how you push down on the scale with your hand and the number changes, so okay, the scale is working for “validity” somehow based on the construct validity of weight. So we decide to test more of its validity using face validity. What would face validity tell you? That the baby weighs less compared to the adult on the scale in terms of what the output weight is from the scale. But when you actually measure the baby, the weight it gives you is 500 pounds. And when the adult steps on the scale, the weight is 20 pounds. This tells us that the scale has poor face validity. And you would question how valid this scale is based on face validity.
To make the distinction between “reliability” and “validity” – you re-weigh the baby and adult several times. But every time, the baby weighs 500# and the adult 20#. So you know it is very reliable, down to the decimal point. But it’s completely invalid, yes based on face validity, but no, not necessarily based on construct validity. And this is a simple subjective and “surface” level thing that you can determine based on your intuition without even knowing statistics.
Oftentimes, examples of face validity are based on the questions asked such as subjective questions or how the questions they are phrased, and yes – this is a form of face validity, because if a test for example is asking about racial prejudice, its questions should pertain to race and ethnicity. But if it is asking about say a math equation, then it also has poor face validity. However, another way you should look at face validity are the “results” you get from a measure and your intuition of what you expect it to be.
Hiding Face Validity
Does this mean that “All” measures need to have face validity? Not necessarily either. Say if an assessment is meant to help to clinically diagnose someone with a mental illness, you may not want the person being assessed to necessarily know based on some questions. Why? Because if for example, someone wants to avoid being diagnosed with schizophrenia, the assessment questions being asked may not necessarily want to show that or they may “catch on” to this and answer differently on purpose to avoid being “diagnosed” with the condition.
Therefore, some measures may want to mask or hide their intentions and have less or no face validity while others should intuitively have face validity.