paperKB
coga / coga-kb
Help
Sign in

Chunk #8 — Online methods — Genotype calling method leveraging existing haplotype calls

Source
A reference panel of 64,976 haplotypes for genotype imputation.
Embedded
yes

Text

We called genotypes from the genotype likelihoods computed on the HRC samples by extending the SNPTools22 algorithm to leverage pre-existing haplotypes available from each cohort. Like other phasing and calling approaches8,10, SNPTools is an MCMC approach in which each sample's haplotypes and genotypes are iteratively updated using the current estimates of all other samples. A low-complexity Hidden Markov Model (HMM) with just four states is used to update each sample, where the states are a set of four "surrogate parent" haplotypes. The MCMC sampler employs a Metropolis-Hastings (MH) step to sample the set of surrogate parents. In large sample sizes the search space for these surrogate haplotypes is huge and results in low acceptance rates for the sampler. Our extension, called GLPhase (see URLs) uses pre-existing haplotypes to restrict the set of possible haplotypes from which the MH sampler may choose surrogate parent haplotypes. For each individual, we restrict the search space to 200 haplotypes that most closely match the two pre-existing haplotypes of the individual using a Hamming distance metric (100 for each haplotype). We run the method on