Chunk #17 — Materials and Methods — Scenario A

Source: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.
Embedded: yes

Text

One difference between versions is that IMPUTE v1 analytically integrates over the unknown phase of the genotypes in the study sample, whereas IMPUTE v2 uses Step 1 to integrate over the space of phase reconstructions via Monte Carlo. This step is accomplished for each individual by sampling a pair of paths through the hidden states (haplotypes) of the model, then probabilistically sampling a pair of haplotypes that is consistent with the observed multilocus genotype. Path sampling is a standard operation for HMMs, although in this case the calculation burden can be reduced by careful inspection of the equations for the HMM forward algorithm [11]. By default, the state space of the model in Step 1 includes all of the known haplotypes in and the current-guess haplotypes in . The computational burden of these calculations (both in terms of running time and memory usage) grows quadratically with the number of haplotypes and linearly with the number of SNPs. We later propose approximations to make these calculations more tractable on large datasets.