Chunk #5 — Materials and Methods — IMPUTE2 algorithm

Source: Genotype imputation with thousands of genomes.
Embedded: yes

Text

To decide which reference haplotypes to copy at a particular point in an IMPUTE2 run, we add an extra step between Steps 1 and 2. After individual i has sampled a new haplotype pair in Step 1, we calculate the Hamming distance from each of these haplotypes to each of the reference haplotypes, using only the overlapping SNPs. Then, separately for each of individual i 's haplotypes, we perform Step 2 (haploid imputation of untyped alleles) using only the khap nearest reference haplotypes as templates. This procedure is not guaranteed to identify khap unique haplotypes as multiple haplotypes near the khap cutoff may have the same Hamming distance. In these situations, we select a random subset of the boundary haplotypes to produce a reference panel with khap states. Intuitively, our approach corresponds to imputing each study haplotype from a “custom” reference panel containing close genealogical neighbors. We generally choose larger values for khap than for k because phasing updates require evaluation of k2/2 HMM states per individual per SNP, whereas imputation updates require evaluation of only khap states.