To assess whether personality scores could be compared across cohorts, latent scores in each cohort were estimated several times based on different values for the item parameters coming from different cohorts (different calibrations). That is, a certain pattern of item responses was used to estimate the latent trait based on the item parameters as calibrated in one cohort, and this was repeated but then using item parameters as calibrated in another cohort. The correlations (see Supplementary Tables 4 and 5) are generally very high (most >0.95; only 3 out of the 84 < 0.90, with the lowest correlation 0.81). Thus, ranking is not much affected by the particular cohort that individuals were in.