Chunk #19 — Methods — The phasing model for low coverage sequence data

Source: Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.
Embedded: yes

Text

In each iteration we must sample a pair of haplotypes (h1, h2) for the ith individuals given both R and H. To do so, we adapted the parsimonious representation of the possible haplotypes of SHAPEIT to deal with genotype likelihoods. We divide region being phased into a number, C, of consecutive non-overlapping segments such that each segment contains 8 possible haplotypes consistent with the GLs. In the case of bi-allelic variants, it means that each segment spans 3 sites, and we will see in the next section how this number can be increased. We use Sl 2 {1, …, C} to denote the segment that contains the lth SNP and bs and es to denote the first site and last site included in the sth segment respectively. We use Alb to denote the allele carried at the lth site by the bth consistent haplotype. We can now represent a possible haplotype as a vector of labels X = {X1, …, XL} where Xl denotes the label of the haplotype at the lth site in the Slth segment. The segmentation implies