Performance Validity
The assessment of performance validity has become standard practice in forensic and neuropsychological evaluations in both adult and pediatric populations (Brooks, Ploetz, & Kirkwood, 2016; Heilbronner et al., 2009; Holcomb, 2018; Martin, Schroeder, & Odland, 2015). The forced-choice task on the CVLT-II and CVLT3 was developed as an embedded measure of performance validity. In addition, Lichtenstein, Holcomb, and Erdodi (2018) presented data on a forced-choice measure developed for use with the CVLT-C. Several other scores across all three instruments have also demonstrated worth as indicators of performance validity, including recognition discriminability, Trials 1–5 Correct, Long-Delay Cued Recall, and Yes/No Recognition Hits (Bauer, Yantz, Ryan, Warden, & McCaffrey, 2005; Brooks & Ploetz, 2015; Shura, Miskey, Rowland, Yoash-Gantz, & Denning, 2016; Whiteside et al., 2015). It is important to note that low scores on these measures alone do not indicate invalid performance but suggest the possibility of symptom exaggeration or other factors that could influence performance.
RESEARCH FOUNDATION
Standardization and Psychometric Properties
Prior to evaluating the reliability data on the CVLT editions, it is important to note that estimates of reliability pose particular difficulties in measures of learning and recall. Measures of internal reliability do not accurately describe the reliability of memory measures due to item score interdependence. Recalling one word on a trial influences the recall of other words on a trial and also increases the likelihood of recalling the same word on further trials. For this reason, measures of test-retest reliability or alternate form reliability provide greater insight into the reliability of memory measures, although they are influenced by practice effects. Error measures or scores with limited variability also produce lower reliabilities due to skewed distributions. These limitations of traditional measures of reliability should be considered when interpreting the reliabilities described for the CVLT-C, CVLT-II, and CVLT3.
CVLT-C: The standardization sample for the CVLT-C consisted of 920 children selected to form a representative sample of the U.S. population, based on March 1988 U.S. Census data. It was stratified based on age, sex, race/ethnicity, education level, and geographic region. Twelve normative age bands were created, each included 1 year of age. Each age band for ages 5–12 included 80 children and bands for ages 13–16 included 70 children. Sex was roughly equal within each age group; all other demographic variables roughly matched the U.S. Census data.
Due to the interdependent nature of responses on word list recall, the CVLT-C utilized several measures of internal consistency. Internal consistency was evaluated using three approaches: comparing overall performance on odd and even numbered learning trials, across-semantic category consistency, and across-word consistency. The odd-even and across-word approaches yielded average correlations of 0.88 and 0.83, respectively, for Trials 1–5. The across-semantic category approach yielded an average correlation of 0.72. Detailed information on how these consistency estimates were defined and derived is provided in the CVLT-C Manual.
The test-retest sample consisted of 106 children tested between 10 and 42 days apart. Results are reported for three age groups: 8-, 12-, and 16-year-olds. Memory and learning measures are particularly susceptible to practice effects that lower test-retest correlations (Strauss, Sherman, & Spreen, 2006) due to repeated exposure of the stimuli to be recalled. Stability coefficients for 13 CVLT-C scores are listed in Rapid Reference 1.6. Test-retest coefficients ranged from 0.61 to 0.73 for the Trials 1–5 T score, from 0.26 to 0.77 for the recall z-scores, and from 0.17 to 0.90 for the error z-scores.
CVLT-II: The standardization sample for the CVLT-II consisted of 1,087 individuals selected to form a representative sample of the U.S. population, based on the March 1999 U.S. Census data. It was stratified based on age, sex, race/ethnicity, education level, and geographic region. Seven normative age bands were created: 16–19, 20–29, 30–44, 45–59, 60–69, 70–79, and 80–89. Each age band included between 107 and 200 individuals. Sex was evenly represented for ages 16–59; in ages 60–89 more females were included than males, reflecting the sex distribution in the population at the older ages.
Internal consistency was evaluated using the three approaches introduced in the CVLT-C: comparing overall performance on odd and even numbered learning trials, across-semantic category consistency, and across-trial word consistency. The odd-even and across-semantic category approaches yielded average correlations of 0.94 and 0.83, respectively, for Trials 1–5. The across-trial word consistency approach yielded an average correlation of 0.79. Estimates obtained in a clinical sample of 124 neuropsychiatric patients produced similar reliability coefficients. Detailed information on how these consistency estimates were defined and derived is provided in the CVLT-II Manual.
Stability Coefficients for 13 CVLT-C Scores, by Age
Score | Age 8 average test-retest r12 | Age 12 average test-retest r12 | Age 16 average test-retest r12 |
List A Trials 1–5 Total | 0.73 | 0.73 | 0.61 |
List B Free Recall Total | 0.59 | 0.26 | 0.66 |
Short-Delay Free Recall | 0.40 | 0.77 | 0.48 |
Short-Delay Cued Recall | 0.75 | 0.49 | 0.59 |
Long-Delay Free Recall | 0.59 | 0.62 | 0.60 |
Long-Delay Cued Recall | 0.69 | 0.69 | 0.59 |
Semantic Cluster Ratio | 0.56 | 0.58 | 0.53 |
Perseverations | 0.90 | 0.32 | 0.31 |
Free-Recall Intrusions | 0.74 | 0.56 | 0.85 |
Cued-Recall Intrusions | 0.59 | 0.17 | 0.74 |
Recognition Hits | 0.38 | 0.24 | 0.80 |
Discriminability | 0.55 |
0.37
|