A variation on this pitfall is when a proportion of individuals in the validation sample are also in the discovery sample and then the bias is proportional to the fraction of the validation samples that was also in the discovery set (see BOX 2). In practice it might be difficult to ascertain if any of the validation individuals were also in the discovery set, in particular if there are only summary statistics (i.e., estimates and standard error of SNP effect and allele frequencies) available, particularly from public databases. We use cattle data44 to illustrate the inflation in variance explained by a SNP predictor when the validation sample is included in discovery steps (Fig 2c)