Agreement was good in most comparisons (κ > 0.7), but was somewhat worse for comparisons with branched source 4 (κ between 0.5 and 0.7). Source 4 systematically under-called cases compared to the other sources, apparently due to the wording of the screening question. This tendency is partially mitigated in the logic for the combined phenotype, where we preferentially use responses to sources 1 to 3 if available.