paperKB
coga / coga-kb
Help
Sign in

Chunk #38 — THE FUTURE: COMBINING IMPUTATION WITH NEW SEQUENCING TECHNOLOGIES

Source
MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.
Embedded
yes

Text

For each site, we counted the number of times that the reference base or an alternative base was sequenced for each individual. For computational convenience, we only considered sites where both bases were observed several times (see Appendix for detailed methods and implementation details) in downstream analyses and assigned the most frequently sampled base to all other sites. On this scale, the shotgun re-sequencing approach typically characterized ~4,000 polymorphic sites across the sampled individuals - ~4 × the SNP density of the Phase II HapMap. Even relatively light shotgun re-sequencing provided very accurate haplotypes for each individual. For example, when 400 individuals were sequenced at 4 × depth, there were only 18.97 errors per individual on average (over 1,000,000 base-pairs). Across ~980,000 sites that were monomorphic in the population only 82 false polymorphisms were called on average. Accuracy was also excellent at sites that were polymorphic in the population. For example, 3,558 of the 3,641 simulated polymorphic sites with MAF>0.5% were identified and, at these sites, alleles were called with an accuracy of 99.93% (see Tables VI and VII). For