paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #60 — Online Methods — Step 3: Approximate HMM decoding

Source
Fast and accurate long-range phasing in a UK Biobank cohort.
Embedded
yes

Text

For each diploid proband in turn, Eagle identifies candidate surrogate parental haplotypes (from the output of step 2) for use within an HMM (similar to the Li-Stephens model46). Eagle then computes an approximate maximum likelihood path through the HMM using a modified Viterbi algorithm (aggressively pruning the state space to increase speed) and calls phase according to the HMM decoding. Finally, Eagle post-processes the phase calls to correct sporadic errors by explicitly taking into account haplotype frequencies and long IBD. Eagle runs two iterations of this entire procedure. In our N≈150,000 analyses, this step required ≈70% of the total computation time (Supplementary Table 2) and reduced the switch error rate to ≈0.4% after the first HMM iteration and ≈0.3% after the second (Fig. 1c,d). In more detail, our algorithm applies the following three procedures to each proband in turn (in each HMM iteration).