Chunk #27 — Materials and Methods — Choice of conditioning states

Source: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.
Embedded: yes

Text

The structure of the underlying genealogical tree is usually unknown (indeed, knowing the tree would essentially solve the phasing problem), so we frame the list of the k closest haplotypes as a random variable that gets updated for each individual at each MCMC iteration. To sample a new phase configuration for diploid individual i, we choose k conditioning states as follows: for each available non-self haplotype (including current-guess haplotypes for other diploid individuals), we calculate the Hamming distance to each of individual i's current-guess haplotypes and store the minimum of these two distances. Then, we use the k haplotypes with the smallest distances to build the HMM and sample a new pair of haplotypes for individual i.