Chunk #64 — Discussion — Extending our results to future studies

Source: Genotype imputation with thousands of genomes.
Embedded: yes

Text

Our cross-validation experiments have provided a wealth of information about how to use existing imputation resources like HapMap 3, but these datasets do not capture the full range of features that will be present in future reference panels. For example, our results are based on data from commercial SNP arrays, whose composition is biased toward variants that share alleles across populations. Consequently, population-specific accuracy contributions like the ones seen in Figure 1 should not be treated as quantitative predictions for newly discovered variants. While we could have used 1000 Genomes data to address the SNP ascertainment issue, the data available when we were preparing this manuscript contained smaller sample sizes and a narrower sampling of human genetic diversity than found in HapMap 3, so we decided to focus on the latter dataset as a model of future 1000 Genomes reference panels. We have run similar imputation experiments with an interim release of the 1000 Genomes Phase I haplotypes, and we have continued to see benefits from using ancestrally inclusive reference panels (B. Howie; unpublished data).