Chunk #9 — Materials and Methods — Polygenic risk score using one or two training populations and genetic ancestry

Source: Multiethnic polygenic risk scores improve risk prediction in diverse populations.
Embedded: yes

Text

We further define polygenic risk scores that include an ancestry predictor, namely, the top principal component in a given data set, computed using the union of all available (training and validation) samples from that population. (We considered only the top PC in each data set that we analyzed, because lower PCs had a squared correlation with phenotype lower than 0.005 in each case; we recommend that ancestry predictors restrict to PCs with squared correlation with phenotype of 0.005 or larger.) We define a polygenic risk score LAT+ANC with mixing weights α1 and α2 as PRSLAT+ANC = α1PRSLAT + α2 PC, and we define a polygenic risk score EUR+LAT+ANC with mixing weights α1, α2 and α3 as PRSEUR+LAT+ANC = α1PRSEUR + α2PRSLAT + α3PC. As above, we employ two different approaches to avoid overfitting: in our primary analyses, we estimate mixing weights using validation data and compute adjusted R2; in our secondary analyses, we estimate mixing weights using cross-validation.