Chunk #20 — IMPUTATION IN DIVERSE POPULATIONS

Source: MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.
Embedded: yes

Text

In a second experiment, we evaluated the performance of our method in 927 samples from 52 populations in the Human Genome Diversity Project (HGDP). In a previous evaluation of tag SNP portability, these 927 samples were genotyped for 1,864 SNPs in 32 autosomal regions (average minor allele frequency 0.15–0.24, depending on population) [Conrad et al., 2006]. The regions were selected to represent regions of high and low LD across the genome. Each region spanned ~330 kb, including a central “core” region of ~90 kb, where ~60 SNPs were attempted, and two ~120 kb flanking regions on either side, where ~12 SNPs were attempted. To evaluate the performance of genotype imputation across these diverse populations, we selected a thinned marker set including 872 SNPs spaced ~10 kb apart across all 32 regions. We then used these SNPs to impute genotypes for the remaining 992 SNPs and evaluated our approach.