PCA was performed within each data set and then across all data sets using FastPCA.19 PCA was conducted on high-quality SNPs with low LD passing filters: SNP directly genotyped in all data sets; minor allele frequency (MAF) >0.05; Hardy–Weinberg equilibrium P>1 × 10−4; not strand ambiguous (i.e. no AT or GC SNPs); not in high LD region (MHC chr6:25–35 Mb, chr8 inversion chr8:7–13 Mb); and r2 between SNPs <0.2 (i.e., the PLINK option: ‘—indep-pairwise 200 100 0.2’, applied twice). Within each data set, scatterplots of PCs were visually examined and outliers removed. This process was repeated until cases and controls appeared evenly interspersed across all PC pairs.