Chunk #52 — Methods — Population differentiation

Source: Genome-wide association study of alcohol consumption and use disorder in 274,424 individuals from multiple populations.
Embedded: yes

Text

To differentiate population groups, we performed principal components analysis (PCA) using common SNPs (MAF > 0.05) shared in MVP [pruned using linkage disequilibrium (LD) of r2 > 0.2] and the 1000 Genomes phase 3 reference panels for European (EUR), African (AFR), admixed American (AMR), East Asian (EAS), and South Asian (SAS) populations using FastPCA in EIGENSOFT45. We analyzed 80,871 SNPs in MVP and 1000 Genomes for use in the PCA analyses. The Euclidean distances between each participant and the centers of the five reference populations (i.e., across all subjects) were calculated using the first 10 PCs, with each participant assigned to the nearest reference population. A total of 242,317 EA; 61,762 AA; 15,864 Hispanic and Latino American (LA); 1565 East Asian American (EAA); and 228 South Asian American (SAA) subjects were identified. A second PCA (within each group) yielded the first 10 PCs for each. Participants with PC scores >3 standard deviations from the mean of any of the 10 PCs were removed as outliers, leaving 209,020 EA; 57,340 AA; 14,425 LA; 1410 EAA; and 196 SAA subjects. Within genetically