Chunk #12 — Methods — eMERGE genetic data

Source: Development and validation of a trans-ancestry polygenic risk score for type 2 diabetes in diverse populations.
Embedded: yes

Text

We used genetic data from 8 eMERGE sites in this work: Cincinnati Children's Hospital Medical Center (CCHMC), Children’s Hospital of Pennsylvania (CHOP), Columbia University, Mass General Brigham (MGB), Mayo Clinic, Icahn School of Medicine at Mount Sinai, Northwestern University (NU), and Vanderbilt University Medical Center (VUMC). Imputed genome-wide data against the Haplotype Reference Consortium (HRC) across the 8 sites were obtained from the eMERGE Network [17, 19]. We merged all eMERGE samples with the 1KG phase 3 data (N=2504), and selected high-quality, common variants shared between the two datasets. We pruned the merged dataset (PLINK command --indep-pairwise 500 50 0.05), retaining a set of independent variants, and calculated principal components (PCs) in the 1KG samples using the LD-pruned variants. We then projected eMERGE samples into the 1KG PC space and grouped each eMERGE sample with one of the four 1KG super-populations—European [EUR], African [AFR], Admixed American [AMR], and East Asian [EAS]—by co-clustering the projected eMERGE samples with the 1KG reference samples. Continental ancestry memberships were verified by visual inspection of the PC plots (Additional File 1: Fig. S1). We further