paperKB
coga / coga-kb
Help
Sign in

Chunk #0 — Results — Overview of methods

Source
Fast and accurate long-range phasing in a UK Biobank cohort.
Embedded
yes

Text

The basic idea of our approach is to harness IBD from distant relatedness (up to ≈12 generations from a common ancestor) that is pervasive within very large cohorts. IBD between a proband and other individuals provides a “surrogate family”13 for the proband, which can then immediately be used to call phase. While this approach is simple in principle, two major challenges have precluded its application to cohorts representing small fractions of large outbred populations. First, identifying IBD is difficult both in terms of accuracy and computational cost; moreover, the most widely used IBD inference methods rely on first phasing the data31-33. Second, LRP by itself can phase only sites at which the proband has at least one relative who is a homozygote; for cohorts representing a sizable fraction of a population, 2–5% of sites may be left unphased13,15, but for smaller cohorts, this fraction may exceed 25% even in isolated populations28, limiting the utility of LRP as a general-purpose method. Our algorithm, Eagle, overcomes the first challenge by employing a new, fast IBD-scanning strategy and overcomes the second challenge by introducing an approximate HMM computation that rapidly refines LRP phase calls.