Psychometrics 101: Scale Reliability and Validity
In order for any scientific instrument to provide measurements that can be trusted, it must be both reliable and valid. These psychometrics are crucial for the interpretability and the generalizability of the constructs being measured.
Reliability is the degree to which an instrument consistently measures a construct -- both across items (e.g., internal consistency, split-half reliability) and time points (e.g., test-retest reliability). One of the most common assessments of reliability is Cronbach’s Alpha, a statistical index of internal consistency that also provides an estimate of the ratio of true score to error in Classical Test Theory. A general rule of thumb is that solid scientific instruments should have a Cronbach’s Alpha of at least .7. There are exceptions to this rule in the case of brief measurements when breadth of content is of primary interest in recapturing a longer scale (see example here).
Below is an example of a reliability analysis for a Recreational Shopping scale. The analysis provides a summary of how the items within the scale perform together in measuring a person’s propensity for recreational shopping. In this example, the overall reliability statistic is .732. The analysis also elucidates the efficacy of each individual item by reporting information such as corrected item-total correlation and Cronbach’s Alpha if an item were deleted. As seen in the example below, we know that item #4 is a great item because it has a high item-total correlation (correlates strongly with the other items) and the overall reliability would drop significantly if the item were deleted from the scale. These, and other metrics all go into understanding the makings of a reliable survey.
As demonstrated in the video linked above, a measure can be reliable without being valid but it cannot be valid without being reliable. Building on reliability, validity is an index of whether or not a particular instrument measures what it purports to measure. For example, let’s say a researcher gave Samantha a paper-and-pencil survey of Extraversion. How would the researcher know that the computed score on that survey actually reflected Samantha’s true level of Extraversion?
There are several different forms of validity. Some of the most commonly assessed forms of validity include content validity, construct validity, and criterion validity. Content validity is an assessment of how well the breadth of the construct has been assessed. For example, have all the elements of Extraversion been captured in the survey (e.g., gregarious, outgoing, active)?
Construct validity is a measure of how well an instrument measures an operationalized or latent construct. Two important sub-components of construct validity include convergent (the degree to which two instruments which measure the same construct are correlated; generally the higher the better) and discriminant validity (the degree to which two unrelated measures are correlated; generally the lower the better). The validity analysis reported below represents the convergent validity of TipTap Lab’s Image Selection Task (IST) of Recreational Shopping with the original paper-and-pencil Recreational Shopping scale (a scale that has been previously scientifically validated and cited in numerous research studies).
This statistic can be interpreted like any correlation (the closer the number is to 1, the stronger the relationship). So, if the correlation is high (as we see below), convergence is strong. Convergent validity is a particularly important statistic at TipTap Labbecause we employ this methodology to convert long, paper-and-pencil measures (all previously validated in external research contexts) into short and engaging image based measurements. It is critical for us to recapture the psychometric properties of the original scales.
Lastly, criterion validity (including both predictive and concurrent validity) is an assessment of how well an instrument predicts known related behaviors or constructs. For instance, if Samantha scored high on the Extraversion scale, we know from previous research that she should be more likely (than an Introvert) to attend a party or talk to a stranger. In the course of our research, criterion validity is constantly being evaluated as more constructs and behavioral outcomes are being studied.
At TipTap Lab, we employ advanced psychometric techniques to build the most reliable and valid measurements possible. We are constantly iterating our process and improving our items as well as our methodology.