Chunk #25 — Online Methods — Eagle2 core algorithm for phasing a single target sample using a set of reference haplotypes — Step 3: Exploration of the diplotype space
Of course, being able to very rapidly compute the probability of a single haplotype is only useful if we can identify a small subset of haplotype probabilities that are worth computing; to this end, Eagle2 employs a second key idea. We perform a beam search from left to right across the chromosome, propagating a small set of likely diplotypes that represent most of the posterior probability mass in the local diplotype space. This approach essentially focuses computational effort on a small subset of the diplotype space (vs. expending computation evenly across the space as in HMM recursion), which is advantageous when most of the space is probabilistically unfavorable but difficult to discard a priori. Full mathematical and engineering details are provided in the Supplementary Note. The overall complexity of this step is O(MHP) time and O(MP+HP) memory, where H and P are “history length” and “beam width” parameters of the beam search described in the Supplementary Note. We explore the sensitivity of Eagle2 to various parameter choices in Supplementary Tables 11, 12, and 13.