An analysis of the 12,898 structure inference SNPs in the original breast cancer study (NHSca-NHSco) as well as the two reconstructed studies using external controls (PLCOca-NHSco and NHSca-PLCOco) demonstrated that there were at least 3 PC directions with highly significant large genetic variations (Table 1). The PCA with the second set of 7,017 structure inference SNPs indicates that there are three major PCs (Tracy-Widom test P-value <0.05) in NHSca-NHSco and NHSca-PLCO but only two major PCs in PLCOca-NHSco (Table S3). The estimated PCs along each major direction (the first three for NHSca-NHSco and NHSca-PLCOco, the first two for PLCOca-NHSco) are highly correlated with the counterparts estimated by the set of 12,898 SNPs (Spearman rank correlation coefficient >0.26 and P-value <10−15).