Chunk #5 — Introduction

Source: Power and predictive accuracy of polygenic risk scores.
Embedded: yes
Text

Polygenic scores must be estimated from a finite training sample, and their effectiveness for association testing and risk prediction depends on the precision of this estimation as well as the proportion of variation explained by the polygenic score. The role of the sample size has not been thoroughly considered in this context. Several authors have expressed sensitivity and specificity in terms of the genetic variance of a predictor [17], [20]–[22], but they did not distinguish the variance explained by an estimated predictor from that of the true predictor, that is the one that would be estimated from an infinitely large sample. While large samples lead to small sampling variance on individual marker effects, the errors accumulate across multiple markers such that the effect of sampling variation on the polygenic score can be considerable. Wray et al [2] used simulations to study the predictive accuracy of scores estimated from finite case/control studies, but did not obtain an explicit relation between sample size and accuracy. Similarly, the International Schizophrenia Consortium (ISC) [3] used simulations to show empirical relations between sample size and