A third problem in assessing the predictive properties of SPOT is the extraordinary challenge of assembling a sufficiently sized collection of validated ‘true’ causal variants for common complex disease on which an assessment of prediction must be based. Even after ‘true’ genotype–phenotype correlations have been validated using the rigorous standards initiated by the onset of GWAS, including statistical significance after correcting for genome-wide multiple testing (1), proper adjustment for population stratification verified by an acceptable genomic inflation factor (18) and replication in independent studies (1), a critical issue that remains is distinguishing the actual causal variants from a potentially large number of LD proxies (1,38). Association statistics are virtually indistinguishable for SNPs in strong LD so that the actual causal variants must be identified by other means such as functional studies. The problem is that the result of a prediction algorithm could be ‘positive’ for an associated non-causal SNP and ‘negative’ for a causal LD proxy. This could artificially inflate the estimate of prediction. The confirmation of pathogenesis may be more straightforward for highly penetrant mutations causing rare Mendelian disorders.