Chunk #20 — Materials and Methods — Training and validation data sets for predicting type 2 diabetes in Latinos: DIAGRAM, SIGMA and UK Biobank

Source: Multiethnic polygenic risk scores improve risk prediction in diverse populations.
Embedded: yes

Text

We performed a secondary analysis using 113,851 British samples from UK Biobank (Galinsky, Loh, Mallick, Patterson, & Price, 2016) (see Web Resources) as European training data (5,198 type 2 diabetes cases and 108,653 controls) (row 4 of Table 1). UK Biobank association statistics were computed with adjustment for 10 PCs (Galinsky, Loh, et al., 2016), estimated using FastPCA (Galinsky, Bhatia, et al., 2016) (see Web Resources). We computed summary statistics for 608,878 genotyped SNPs from UK Biobank after removing A/T and C/G SNPs to eliminate potential strand ambiguity. We analyzed 187,142 SNPs present in the SIGMA and UK Biobank data sets. We defined type 2 diabetes cases in UK Biobank as “any diabetes” with “age of diagnosis > 30”. We note that the p-values at two top type 1 diabetes (T1D) loci (rs2476601, rs9268645) were only nominally significant (p∼0.05) for this T2D phenotype, indicating low contamination with T1D cases.