In the final step of our measurement strategy, we generate and evaluate the quality of scores for the full sample, including all repeated measures. This quality check could take a number of forms including cross-validation with estimates obtained from a second calibration sample and convergent/divergent validity analysis with other criterion variables. For our example, we chose to repeat our measurement building process with a second calibration sample drawn from our pooled sample and found that scores were highly correlated across the two calibration samples. After conducting such sensitivity analysis, we generated scores for the longitudinal data by refitting the final model to the full, pooled sample of repeated measures, holding constant all the parameter estimates at the values previously obtained from the calibration sample, including all necessary impact and DIF parameters. Once we generated scores for all individuals and all repeated measures, we evaluated their reliability. For our depression measure, the largest standard errors for the scores were obtained at levels below the mean, a common feature of most measures of psychopathology (Reise & Waller, 2009).