Chunk #12 — Materials and Methods — Assessment of methods

Source: Multiethnic polygenic risk scores improve risk prediction in diverse populations.
Embedded: yes

Text

Finally, for analyses in which we needed to use samples from the same cohort for both building PRS (i.e., estimating effect sizes b^i) and validation, we also used cross-validation. In our primary analyses, we employed 10-fold cross-validation, using 90% of the cohort to estimate b^i and the remaining 10% of the cohort to validate predictions (using the adjusted R2 metric with best-fit mixture weights α^k). In our secondary analyses, we employed 10×9-fold cross-validation, in which 90% of the cohort was used to estimate both b^i and α^k and the remaining 10% of the cohort was used to validate predictions. To estimate α^k, we iteratively split the 90% set of training samples into an 80% training-training set and a 10% training-test set; we estimated b^i in the 80% training-training set and computed a PRS for the 10% training-test set for each of the 9 training-test folds, and we then performed a single regression of phenotype against each PRS across the entire 90% set of training samples to estimate α^k. Finally, we re-estimated b^i for the final test prediction using the entire 90% set of training samples.