Chunk #4 — INITIAL EVALUATION OF IMPUTED GENOTYPES AND HAPLOTYPES — HAPLOTYPING

Source: MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.
Embedded: yes

Text

Our approach was inspired by the Markov models commonly used for pedigree analysis [for examples, see Abecasis et al., 2002; Kruglyak et al., 1996; Lander and Green, 1987] and shares several features with other HMMs used to describe sampled haplotypes as a mosaic of a set of reference haplotypes [Daly et al., 2001; Li and Stephens, 2003; Mott et al., 2000; Stephens and Scheet, 2005a]. In order to evaluate its performance, we simulated two sets of 100 1 Mb regions that mimic the degree of linkage disequilibrium (LD) in the HapMap CEU and YRI samples [Schaffner et al., 2005]. In each region, we simulated genotypes for ~200 markers, ascertained to mimic HapMap I allele frequency patterns [Marchini et al., 2006], in 90 individuals with 2% of the genotypes missing at random. We then used our method to reconstruct individual haplotypes and tallied three measures of haplotyping quality [Marchini et al., 2006]: (1) the number of incorrectly imputed missing genotypes, (2) among heterozygous sites, the number of consecutive sites that are phased incorrectly with respect to each other (this is the