The effects of SNPs on a trait must be estimated from a sample of finite size and so the effects are estimated with some sampling error. If there were only a few loci that affected a trait, it would be possible to estimate their effects quite accurately, but most complex traits are controlled by a very large number of largely unknown loci35. Therefore the discovery stage of estimating the prediction equation may involve a genome-wide panel of millions of SNPs. The true effects of most SNPs are small and so the accuracy with which these effects are estimated is low unless a very large discovery sample is used. The correlation between phenotype and a predictor that uses all SNPs simultaneously in a random mating population can be expressed as a function of effective population size (or the effective number of independent chromosome segments which is a function of effective population size), heritability and the size of the discovery sample (Equation 1, BOX 1)36–38. Specifically, SNP effects will be better estimated when the sample size of the discovery cohort increases (Figure BOX1); estimated or predicted effect sizes of rare variants will be difficult to verify even with large sample sizes.