Chunk #0 — RESULTS — Pre-phasing run-time performance

Source: Fast and accurate genotype imputation in genome-wide association studies through pre-phasing.
Embedded: yes

Text

To illustrate the computational advantages of pre-phasing, we analyzed a GWAS dataset of 2,490 individuals from the 1958 British Birth Cohort of the Wellcome Trust Case Control Consortium 2 (WTCCC2)10. We imputed this dataset from a series of reference panels using related imputation methods that account for phase uncertainty in different ways (Table 1). IMPUTE version 1 (“IMPUTE111”) uses an analytical integration strategy. This was relatively efficient with a reference panel of 60 individuals (41 minutes per genome with 1000 Genomes Pilot data), but the computational burden grew quickly as haplotypes were added to the reference set. By contrast, IMPUTE version 2 (“IMPUTE26”) uses a haplotype sampling strategy. This approach scaled more favorably with larger reference panels, but it still required 512 minutes per genome to impute from the latest 1000 Genomes panel. By comparison, an updated version of IMPUTE2 that uses our proposed approach required a one-time pre-phasing investment of 25 minutes per genome, then just 24 minutes to impute each sample from the largest reference panel. We observed similar trends with MaCH12 (which typically uses a similar approach to IMPUTE1) and minimac (which performs imputation with pre-phased haplotypes in the MaCH framework), as shown in Supplementary Table 1.