Chunk #37 — Discussion

Source: Power and predictive accuracy of polygenic risk scores.
Embedded: yes

Text

When a sample is to be split into two subsets, a roughly even split yields the greatest power for testing association of the score. However, for predicting individual trait values it is more important for the training set to be large, and standard procedures such as 10-fold cross-validation remain preferable. If both testing and prediction are intended from a single sample, a pragmatic approach is to ensure adequate power by allocating about 2000 cases and 2000 controls to the replication sample, providing this is less than half the total, and then to ensure high predictive accuracy by allocating the remainder to the training sample.