Chunk #18 — Materials and Methods — Training and validation data sets for predicting type 2 diabetes in Latinos: DIAGRAM, SIGMA and UK Biobank

Source: Multiethnic polygenic risk scores improve risk prediction in diverse populations.
Embedded: yes

Text

Our analyses of type 2 diabetes in Latinos used summary association statistics from the DIAGRAM data set and genotypes and phenotypes from the SIGMA data set (row 3 of Table 1). The DIAGRAM data set consists of 12,171 cases and 56,862 controls of European ancestry for which summary association statistics at 2,473,441 imputed SNPs are publicly available (see Web Resources) (Morris et al., 2012). As noted above, the SIGMA data set consists of 8,214 unrelated Latino samples (3,848 type 2 diabetes cases and 4,366 controls) genotyped at 2,440,134 SNPs after QC. QC procedures are reported in (SIGMA Type 2 Diabetes Consortium et al., 2014), and include the removal of one individual from each pair of relatives with relatedness greater than 10% (n=532), as well as a PCA analysis using EIGENSTRAT (Price et al., 2006) (see Web Resources) to identify and remove samples with evidence of high African or East Asian ancestry (n=181).