Chunk #16 — Pitfalls of the analysis — Pitfall 1: Validation and discovery sample overlap

Source: Pitfalls of predicting complex traits from SNPs.
Embedded: yes

Text

occurs is when testing the prediction in the discovery sample, i.e., the same data are used to estimate the effect of SNPs on phenotype and to make predictions53, 54 . We illustrate the overlap pitfall with examples in dairy cattle, Drosophila and human populations (FIG 2a-c). . For example, in a GWAS on ~150 sequenced inbred lines of Drosophila54 in which this was done the authors concluded that 6–10 SNPs selected from > 1M SNPs together explained 51–72% of variation in the lines (depending on the trait analysed). However, a cross-validated Bayesian prediction analysis using all genetic markers on the same data found that only 6% of phenotypic variation could be explained by the predictor51.