Chunk #18 — Materials and Methods — Scenario A

Source: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.
Embedded: yes

Text

In Step 2, the algorithm uses each of the haplotypes in (which were sampled in Step 1) to impute new genotypes for SNPs in U. The HMM state space for this step includes only the reference panel haplotypes . The imputation is accomplished by running the forward-backward algorithm for HMMs independently on each haplotype in and then analytically determining the marginal posterior probabilities of the missing alleles – this process is simply a haploid analogue of the one used by IMPUTE v1. If we assume that both haplotypes were sampled from a population that conforms to Hardy-Weinberg Equilibrium (HWE), it is straightforward to convert these allelic probabilities to genotypic probabilities for individual i. Across iterations, we can then sum the posterior probabilities for each missing genotype as if they were weighted counts; at the end of a run, the final Monte Carlo posterior probabilities can be calculated by renormalizing these sums. By contrast with Step 1, the computational burden of these calculations grows only linearly with the number of haplotypes. Consequently, Step 2 can usually avoid the approximations needed to make Step 1 feasible, thereby allowing us to make full use of even very large reference panels.