Chunk #34 — Results — Analyses of type 2 diabetes in Latinos

Source: Multiethnic polygenic risk scores improve risk prediction in diverse populations.
Embedded: yes

Text

In the above results (Table 3 and Fig 2), we allowed each prediction method to optimize its mixing weights via an in-sample fit in the target sample. This procedure could in principle be susceptible to overfitting (Kooperberg, LeBlanc, & Obenchain, 2010; Wray et al., 2013). We did not expect overfitting to be a concern given the small number of mixing weights optimized (at most 3) relative to the target sample size (8,181) and given our use of adjusted R2 as the evaluation metric, but to verify this expectation, we repeated our analyses using 10x9-fold cross-validation (see Methods). Methods that use two training populations remained much more accurate than single ancestry methods, as prediction accuracy decreased only very slightly (2-4% relative decrease vs. Table 3) for each method (S13 Table). These slight decreases are expected, since mixing weights optimized within 10x9 cross-validation are slightly suboptimal (due to reduced training data) and prediction accuracy is mildly sensitive to the choice of mixing weights (S2 Fig).