Chunk #41 — Discussion

Source: Multiethnic polygenic risk scores improve risk prediction in diverse populations.
Embedded: yes

Text

We have shown that combining training data from European samples and training data from the target population attains a >70% relative improvement in prediction accuracy for type 2 diabetes in both Latino and South Asian cohorts compared to prediction methods that use training data from a single population. In addition, this approach attains 30% relative improvement in prediction accuracy for height in an African cohort. These relative improvements are robust to overfitting, consistent with simulations and reduce the documented gap in risk prediction accuracy between European and non-European target populations (Bustamante, De La Vega, & Burchard, 2011; International Schizophrenia Consortium et al., 2009; Popejoy & Fullerton, 2016; Rosenberg et al., 2010; Scutari et al., 2016; Vilhjálmsson et al., 2015); we note that there are at least 35 phenotypes for which there are published GWAS data sets in Europeans and at least one non-European population (with minimum sample size of 8,000) that are listed in the NHGRI-EBI GWAS Catalog (MacArthur et al., 2017), where our approach could potentially be valuable (S21 Table). Intuitively, our approach leverages both large training sample sizes