Chunk #74 — Discussion — Computational strategies for imputation with large, sequence-based reference panels

Source: Genotype imputation with thousands of genomes.
Embedded: yes

Text

Beagle's basic modeling approach is to combine haplotypes into clusters. This speeds up computation because it restricts the number of HMM states that need to be considered: rather than perform HMM calculations on every haplotype in a dataset, Beagle can run the calculations on a smaller set of clusters. Similar state-reduction techniques are used by GERBIL (Kimmel and Shamir 2005), fastPHASE (Scheet and Stephens 2006), GEDI (Kennedy et al. 2008), and other related methods. By contrast, the basic HMM used by IMPUTE2 and MaCH includes a state for every haplotype. Using all of the states makes computation intractable, which is why IMPUTE2 restricts the states via its k and khap parameters. The intuition is that the “surrogate family members” identified in this way should include the most informative haplotypes for a particular individual in a particular part of the genome.