Chunk #36 — THE FUTURE: COMBINING IMPUTATION WITH NEW SEQUENCING TECHNOLOGIES

Source: MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.
Embedded: yes

Text

With the rapid development of very high-throughput re-sequencing technologies [Bentley, 2006], it is oft proposed that genotyping-based approaches will soon become outdated. Re-sequencing-based approaches capture variants that are absent from public databases including, potentially, population specific variants. Our haplotyping approach can use whole genome re-sequencing data as input. In this setting, it uses information from individuals with similar haplotypes to reconstruct patterns of variation in regions where deep coverage is not available. In principle, the approach could be useful to help describe regions that, due to chance, are poorly covered in a particular sequencing experiment or to allow for economical evaluation of many individuals. To evaluate the possibilities, we simulated data for ten 1 Mb regions and simulated shotgun sequence data for each region. We simulated reads that were only 32 base pairs long and with a per base-pair error rate of 0.2%. Very roughly, these correspond to the performance of early versions of next generation re-sequencing technologies; newer versions of these technologies can generate longer and more accurate reads and should thus outperform the simulations presented here. We then