Chunk #34 — DISCUSSION

Source: Practical considerations for imputation of untyped markers in admixed populations.
Embedded: yes

Text

In addition to issues related to multiple reference populations, there are at least two other major areas for improvement in current imputation algorithms. First, inclusion of phenotype data during imputation is theoretically required for unbiased results [Allison, 2002]. Ignoring phenotype data (or, more generally, dependent variables) during imputation is equivalent to assuming that all sampled individuals are no more related than are individuals randomly sampled from the population [Marchini and Howie, 2008]. However, cases are more related near a disease locus than this assumption implies [Marchini and Howie, 2008]. Incorporating phenotype data into the imputation process results in smaller bias but larger variance of effect size estimates [Epstein and Satten, 2003; Lake et al., 2003; Dai et al., 2006]. The three programs we investigated, BEAGLE, MACH, and PLINK, as well as IMPUTE and fastPHASE, ignore phenotype data. SNPMStat accounts for the phenotype but uses a multinomial model and does not account for long-range linkage disequilibrium [Lin et al., 2008]. Second, BEAGLE and PLINK can infer haplotype phase for certain forms of family data in addition to unrelated individuals. In contrast,