Chunk #76 — Discussion — Computational strategies for imputation with large, sequence-based reference panels

Source: Genotype imputation with thousands of genomes.
Embedded: yes

Text

These trends should persist as imputation datasets continue to grow: clustering models will need to add even more states to their HMMs to remain competitive on accuracy, whereas the closest k (or khap) surrogate family haplotypes will become even more informative, thereby enhancing the running time and accuracy advantages of methods like IMPUTE2. The natural endpoint of this process will arrive when so many genomes have been sequenced that imputation requires just a handful of the closest genealogical neighbors, which is where “surrogate parent” methods, like the one developed by Kong et al. (2008), will take hold. Until that point is reached, we suggest that our surrogate family approximation will remain an attractive way to balance accuracy and speed.