Overall, we observed that imputation works well for the newly discovered SNPs, although not as well as for frequency-matched SNPs on the available genotyping arrays, even though newly discovered SNPs show greater haplotype sharing. This difference may be due to an ascertainment bias in the discovery and choice of SNPs on the arrays—most SNPs in HapMap and on arrays were originally detected by sequencing a few individuals, representing a fraction of haplotypes in the population18; these haplotypes are better represented on arrays (which focused on SNPs that served as good proxies) than are newly discovered SNPs. This difference is markedly seen in a comparison of nearby, frequency-matched SNPs from within either the array or ENCODE: looking only at SNPs with two copies of the minor allele, 5% of the time, two frequency-matched ENCODE SNPs are perfect proxies for each other, whereas the fraction is 70–80% for a pair of frequency-matched array SNPs (Supplementary Fig. 9). This highlights the need for caution in extrapolating from low-frequency array SNPs to low-frequency sequencing SNPs.