Significant DIF does not imply that its effects are dramatic. To assess the extent to which DIF results in different scoring, depending on what calibration is used, Neuroticism and Extraversion scores were estimated using different cohort-specific calibrations and these were compared. For example, how much would the estimated scores for individuals in the Dutch NTR sample differ if instead of using the NTR calibration (i.e., using item parameters as estimated using NTR data), the Finnish HBCS calibration were used? If measurement invariance holds perfectly, the correlation between the different score estimates should be very close to 1. These correlations were computed for NEO-FFI, NEO-PI-R and EPQ inventories in the appropriate cohorts.