To estimate confidence intervals of PRS performance (R2, as explained above), we conducted 1,000 bootstraps using the R package boot. We also conducted 10,000 bootstraps to evaluate whether the R2 difference between two PRS models (functionally informed – standard) is significantly greater than 0; we calculated the R2 difference between two PRS models in each round of bootstrapping (delta R2), and assessed its distribution in 10,000 bootstraps. If we let N be the frequency of delta R2 < 0, we define one-tailed P values for delta R2 > 0 as (N + 1)/10,000.