paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #4 — Results — Phasing performance using genotyped reference panels

Source
Reference-based phasing using the Haplotype Reference Consortium panel.
Embedded
yes

Text

For the UK Biobank reference-based phasing benchmarks, we generated simulated reference panels by randomly selecting Nref = 15,000, 30,000, 50,000, or 100,000 samples (not containing trio members) and phasing them using Eagle113. We phased each subset independently (rather than phasing all samples together and then extracting subsets) to better reflect the phase inaccuracy that would be present in a real reference panel of a given size. We then benchmarked the computational cost and accuracy of reference-based phasing methods by using each panel of 2Nref haplotypes to phase sets of other UK Biobank target samples including the 70 European-ancestry trio children, which we used for benchmarking accuracy (Online Methods). To cover a wide range of linkage disequilibrium structure, we performed these benchmarks on chromosomes 1, 5, 10, 15, and 20 (a total of 174,595 markers comprising ≈25% of the genome) using Eagle2, SHAPEIT212, SHAPEIT2 with its –no–mcmc option (which increases speed at the expense of accuracy), and a reference-based version of Eagle1 that we implemented for comparison. We also attempted to benchmark Beagle v4.125 but found it was too slow for