paperKB
coga / coga-kb
Help
Sign in

Chunk #48 — Materials and Methods — Derivation of the sum-score

Source
Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics.
Embedded
yes

Text

Let z be the vector of Z-statistics coming from regressing the phenotype on each of the n SNPs within a gene-region. By construction, each Z-statistic has zero mean under the null. When both the outcome trait and the genotypes are standardized, the linear regression Z-statistics are essentially the scalar products of the genotype and the phenotype vectors. In other words, each Z-statistic in the region represents a weighted average of the same set of independent, identically distributed random variables. It can be shown that the correlation between two such mixtures, i.e. two Z-statistics, equals to the correlation between the weights, i.e. the correlation between the corresponding SNPs. Thus, the covariance matrix of z is simply the pairwise SNP-by-SNP correlation matrix, denoted by Σ. Furthermore, the central limit theorem ensures that in case of sufficiently large sample size the Z-statistics are normally distributed. These facts put together yield that–under the null-hypothesis that no signal is present–z follows a multivariate normal distribution, z∼Nn(0,Σ). For a detailed derivation see supplementary material in Xu et al[44] for example. Note that the between SNP correlation matrix Σ can be estimated from external data[17,45].