paperKB
coga / coga-kb
Help
Sign in

Chunk #12 — Materials and Methods — Assessment of methods

Source
Multiethnic polygenic risk scores improve risk prediction in diverse populations.
Embedded
yes

Text

Finally, for analyses in which we needed to use samples from the same cohort for both building PRS (i.e., estimating effect sizes b^i) and validation, we also used cross-validation. In our primary analyses, we employed 10-fold cross-validation, using 90% of the cohort to estimate b^i and the remaining 10% of the cohort to validate predictions (using the adjusted R2 metric with best-fit mixture weights α^k). In our secondary analyses, we employed 10×9-fold cross-validation, in which 90% of the cohort was used to estimate both b^i and α^k and the remaining 10% of the cohort was used to validate predictions. To estimate α^k, we iteratively split the 90% set of training samples into an 80% training-training set and a 10% training-test set; we estimated b^i in the 80% training-training set and computed a PRS for the 10% training-test set for each of the 9 training-test folds, and we then performed a single regression of phenotype against each PRS across the entire 90% set of training samples to estimate α^k. Finally, we re-estimated b^i for the final test prediction using the entire 90% set of training samples.