paperKB
coga / coga-kb
Help
Sign in

Chunk #34 — Methods — MVP datasets. — Ethics statement:

Source
Genome-wide meta-analysis of problematic alcohol use in 435,563 individuals yields insights into biology and relationships with other traits.
Embedded
yes

Text

We removed subjects with mismatched genotypic and phenotypic sex and one subject randomly from each pair of related individuals (kinship coefficient [47] threshold = 0.0884), leaving 107,438 phase2 subjects for subsequent analyses. We used the same processes as MVP phase1 to define EAs. First, we ran principal components analysis (PCA) on 74,827 common SNPs (MAF > 0.05) shared by MVP and the 1000 Genomes phase 3 reference panels using FastPCA [48]. Then we clustered each participant into the nearest reference population according to the Euclidean distances between the participant and the centers of the 5 reference populations using the first 10 PCs. A second PCA was performed for participants who were clustered to the reference European population (EUR), and outliers were removed if any of the first 10 PCs were > 3 standard deviations from the mean, leaving 67,268 EA subjects.