Chunk #30 — Results

Source: Genotype imputation with thousands of genomes.
Embedded: yes
Text

We implemented our approximation within IMPUTE2, which uses an iterative algorithm to impute untyped variants in GWAS datasets. Whereas the original algorithm imputes genotypes from the full set of reference haplotypes, the new approximation imputes each study haplotype from a custom subset of reference haplotypes. (Study genotypes seldom come with known phase, but the haplotypes can be inferred as part of the algorithm.) Each of these custom reference panels includes the khap reference haplotypes that have the fewest allele differences with a study haplotype at overlapping SNPs, where khap is a user-defined parameter that controls the computational cost of imputation. If this method is applied over a limited genomic region (e.g. a few million base pairs rather than a whole chromosome), we expect the khap reference haplotypes to be enriched for those that share recent common ancestry with the study haplotype of interest. We refer to these haplotypes as “surrogate family members” because, like real family members, they may share segments of nearly identical DNA that can be used for imputation. We explore the relationship between khap and accuracy in the results that follow, and we provide practical suggestions for applying this approximation in the Discussion.