Chunk #6 — Results — TWAS performance in simulation and cross-validation

Source: Integrative approaches for large-scale transcriptome-wide association studies.
Embedded: yes

Text

We evaluated whether the expressions of the 6,924 highly heritable genes could be accurately imputed from cis-SNP genotype data alone in these three cohorts. In each tissue, we used cross-validation to compare predictions from the best cis-eQTL to those from all SNPs at the locus either in a best linear unbiased predictor (BLUP) or in a Bayesian model30,31 (Methods). On average, the Bayesian linear mixed model (BSLMM)31, which uses all cis-SNPs and estimates the underlying effect-size distribution, attained the best performance with a 32% gain in prediction R2 over a prediction computed using only the top cis-eQTL (Figure 4, Supplementary Figure 3). The BSLMM exhibited a long tail of increased accuracy, more than doubling the prediction R2 for 25% of genes (Supplementary Figure 4). In contrast to complex traits where hundreds of thousands of training samples are required for accurate prediction32,33, a substantial portion of variance in expression can be predicted at current sample sizes due to the much smaller number of independent SNPs in the cis region21. Furthermore, larger training sizes will continue to increase the total number of