The resulting items and sub-items (created to account for differential item functioning) were then subjected to calibration and scoring procedures using MULTILOG (Thissen, 1991). Using the 2-parameter logistic (2PL) Item Response Theory model, we estimated discrimination and severity parameters for all items and sub-items and used these parameters to estimate maximum a posteriori (Thissen & Wainer, 2001) scores for each observation of externalizing symptoms for all waves and reporters. The resulting scores take into account differences in item parameters as a function of age, gender, and study as identified in differential item functioning analyses and can be interpreted on a z-score metric. (The z-score metric is relative to the mean and standard deviation of the calibration sample as a whole rather than relative to each age period assessed.) These scores served as the outcomes of interest in all subsequent analyses.