We reported high correlations among the IRT-based scores and the sum scores for the specific personality inventories. One may argue that sum scores can serve just as well in analyses. There are several reasons however why the IRT approach is superior. First, the IRT approach leads to less biased estimates for Neuroticism and Extraversion if not exactly the same set of items is administered to all individuals, as was often the case in the cohorts because of missing data or because of assessing multiple inventories or versions. In addition, the IRT approach results in increased measurement precision for individuals who have been assessed using multiple inventories. Without fitting an IRT model, it is not clear how to weigh items from different inventories. Moreover, by using IRT, groups of individuals within cohorts with different item sets can be compared since all individuals are scored on one common metric, once linking is possible. Lastly, and most importantly, the IRT approach enables one to make explicit the extent to which item data from multiple inventories can be combined, both within and across cohorts. When simply using sum scores for different inventories separately and pooling results, it remains unknown whether this is actually appropriate.