We removed subjects with mismatched genotypic and phenotypic sex and one subject randomly from each pair of related individuals (kinship coefficient [47] threshold = 0.0884), leaving 107,438 phase2 subjects for subsequent analyses. We used the same processes as MVP phase1 to define EAs. First, we ran principal components analysis (PCA) on 74,827 common SNPs (MAF > 0.05) shared by MVP and the 1000 Genomes phase 3 reference panels using FastPCA [48]. Then we clustered each participant into the nearest reference population according to the Euclidean distances between the participant and the centers of the 5 reference populations using the first 10 PCs. A second PCA was performed for participants who were clustered to the reference European population (EUR), and outliers were removed if any of the first 10 PCs were > 3 standard deviations from the mean, leaving 67,268 EA subjects.