Chunk #9 — RESULTS — Evaluation of imputation accuracy using sequence data

Source: Fast and accurate genotype imputation in genome-wide association studies through pre-phasing.
Embedded: yes

Text

We next extended the experiment by imputing the full set of SNPs in the EUR sequence data (“sequence SNPs”). As expected, the sequence SNPs were imputed less accurately than the WTCCC2 SNPs within each frequency bin. For example, haplotype sampling produced mean R2 values of 0.82, 0.86, and 0.92 (for MAF 1–3%, 3–5%, and >5%, respectively) in the array SNP analysis described above, but the accuracy dropped to 0.66, 0.79, and 0.91 when evaluating all sequence SNPs in the same frequency ranges (Table 2). Despite the added difficulty of imputing low-frequency and unascertained variants, pre-phasing was still nearly as effective as haplotype sampling at these SNPs (mean R2 of 0.64 vs. 0.66 for MAF 1–3%; Table 2). This analysis also allows us to measure accuracy at SNPs with MAF < 1%, where we observed mean R2 values of 0.42 and 0.44 for pre-phasing and haplotype sampling, respectively. Hence, while all methods have lower imputation accuracy at unascertained and low-frequency SNPs, pre-phasing still achieves competitive accuracy at such variants.