occurs is when testing the prediction in the discovery sample, i.e., the same data are used to estimate the effect of SNPs on phenotype and to make predictions53, 54 . We illustrate the overlap pitfall with examples in dairy cattle, Drosophila and human populations (FIG 2a-c). . For example, in a GWAS on ~150 sequenced inbred lines of Drosophila54 in which this was done the authors concluded that 6–10 SNPs selected from > 1M SNPs together explained 51–72% of variation in the lines (depending on the trait analysed). However, a cross-validated Bayesian prediction analysis using all genetic markers on the same data found that only 6% of phenotypic variation could be explained by the predictor51.