1000 Genomes Project (1KGP) phase 3 variants (2,504 samples, 26 populations) found in common with the post QC filtered S4S genotypes were merged together. Regions with high LD were excluded (Price et al., 2006, 2008) and the common set of variants was then pruned (r2 < 0.1) using PLINK 1.9 (Purcell et al., 2007; Chang et al., 2015) (–indep-pairwise 1,500 150 0.1) to yield 109,259 semi-independent variants for ancestry analyses. EIGENSOFT/SmartPCA (Patterson et al., 2006; Price et al., 2006) was used to perform PCA using only the 1KGP phase 3 reference panel to determine SNP weights for each eigenvector. This solution was then projected onto the S4S data to generate 10 principal components (PCs).