Chunk #19 — Method — Genome-wide Scoring Procedure

Source: Three mutually informative ways to understand the genetic relationships among behavioral disinhibition, alcohol use, drug use, nicotine use/dependence, and their co-occurrence: twin biometry, GCTA, and genome-wide scoring.
Embedded: yes

Text

Genome-wide scoring with 10-fold cross-validation is computationally demanding. This prevented us from conducting permutation or other tests of statistical significance. Fortunately, the cross-validation statistic in use here is the Pearson correlation and is amenable to short-hand tests of significance. The standard error of the Pearson correlation coefficient after z-transformation is 1/√ (N-3), and z = arctan(r). A significant t-score = 1.96. The p-value for z and any N is thus approximately 1 - Φ(z × √N), where Φ is the distribution function of the standard normal distribution. The average within-family correlation, averaging over all five phenotypes, was .24. Multiplying the total sample size by one minus the squared average within family correlation yields 7188×(1–.242)≈6774, an estimate of the effective sample size. When N = 6774, a correlation coefficient r must be greater than .02 to be significant at p < .05. If we are conservative, and set our effective sample size at 5,000 individuals, then a correlation coefficient must be r > .024 to be significant at p < .05. For all analyses we covaried out the linear effects of age, sex, year of birth, generational status (parent/offspring), and the first 10 genetic principal components.