The genotypes of 10k SNPs at least 20kb apart along the genome were taken to calculate the pairwise distance matrix, which was used as the input for classical multidimensional scaling. Individuals were projected using the first two dimensions that correspond to the largest eigenvalues, and the ‘genetic distance’ to Europeans was calculated as the distance in the projected space between individual and the center (median) of the CEU cluster. An empirical distance threshold was adopted, above which individuals were regarded as ethnic outliers (Supplementary Figure 8).