Chunk #3 — INTRODUCTION

Source: Practical considerations for imputation of untyped markers in admixed populations.
Embedded: yes

Text

Most current methods for imputing missing genotypes are extensions of previously described algorithms for inferring haplotype phase and have been reviewed in detail [Browning, 2008]. We classified several existing programs on the basis of the underlying model for estimating the conditional distribution of haplotype frequencies (Table I). One class is based on localized clusters of haplotypes and includes BEAGLE [Browning and Browning, 2007] and fastPHASE [Scheet and Stephens, 2006]. Both BEAGLE and fastPHASE use a hidden Markov model to cluster haplotypes but BEAGLE is more parsimonious by allowing fewer possible transitions and emissions. fastPHASE fixes the number of clusters in the model whereas BEAGLE dynamically varies the number of clusters at each locus. A second class is based on a multinomial model of haplotype frequencies and includes PLINK [Purcell et al., 2007] and SNPMStat [Lin et al., 2008]. Methods based on the multinomial model estimate haplotype frequencies using an expectation-maximization algorithm but can only consider a window of a few markers at a time because haplotype frequencies become too low for accurate estimation otherwise. A third class is explicitly based