To compare the population substructure between the original two CGEMS initial scans, the PLCOca-PLCOco and NHSca-NHSco, we identified the PC directions by applying the PCA on each study separately, and compared directions between two studies using the Spearman rank correlation coefficient of the SNP loadings (Table 2). The top three PC directions between the two studies are significantly correlated (with Spearman rank correlation coefficient >0.14, and P-value <10−15).