We included only one of each set of duplicated samples and one of each pair of samples which were identified as full siblings by an initial scan of relatedness in PLINK39. We investigated population structure by PCA of all the autosomal SNPs that passed QC, and included only samples of European ancestry (Supplementary Fig. 9). We excluded samples with gender misidentification by examining the mean of the intensities of SNP probes on the X and Y chromosomes. We also excluded samples with missing call rate ≥ 2% and samples on two plates which showed extremely high level of mean inbreeding coefficients. A total of 2,400 (HPFS), 3,265 (NHS) and 8,682 (ARIC) samples were retained for analysis respectively, with a combined set of 14,347 samples.