Chunk #58 — Results — Computational benchmarking

Source: Genotype imputation with thousands of genomes.
Embedded: yes

Text

For fixed k and khap, IMPUTE2’s computational burden scales linearly with the number of study individuals, the number of reference haplotypes, the number of study SNPs, and the number of reference SNPs. Each of these factors makes a different per-unit contribution to the overall running time, with the number of study individuals and the number of reference SNPs having the biggest effect in modern datasets. Extrapolating the numbers from Table 2 to the entire genome and assuming the availability of 100 parallel computer processors, we predict that it would take IMPUTE2 about a day to impute 1000 individuals from a reference panel with thousands of sequenced haplotypes. For investigators with limited computational resources or very large GWAS cohorts, the imputation can be made even faster by prephasing the GWAS genotypes, as we explain in the Discussion.