paperKB
coga / coga-kb
Help
Sign in

Chunk #2 — Results — Computational cost

Source
Fast and accurate long-range phasing in a UK Biobank cohort.
Embedded
yes

Text

We benchmarked Eagle against state-of-the-art phasing methods—Beagle8, HAPI-UR11, and SHAPEIT212 (see URLs)—on subsets of the UK Biobank data set containing N≈15,000, 50,000, or 150,000 samples (Online Methods). After QC, this data set contained 627K autosomal markers with average heterozygosity 0.189 and minor allele frequency (MAF) distribution typical of genotyping arrays: 43K variants with MAF 0.1–1%, 235K variants with MAF 1–5%, and 349K variants with MAF 5–50%. (Our QC procedure excluded very rare variants with MAF<0.1%; see Online Methods.) For our first benchmark, we phased only the first 40cM of chromosome 10 (≈1% of the data, 5,824 SNPs spanning 18Mb) to allow as many methods as possible to complete in <2 weeks (using up to 10 cores on a single compute node; all methods except HAPI-UR support multithreading over 10 cores). We observed that Eagle achieved a 1–2 order of magnitude speedup over other methods across the sample size range (Fig. 2a and Supplementary Table 1), attaining a 14x speedup over SHAPEIT2 and a 12x speedup over HAPI-UR at N≈150,000. (Beagle was unable to phase 1% of the genome in 2