paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #22 — Methods — The phasing model for low coverage sequence data

Source
Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.
Embedded
yes

Text

Based on the segmentation of the chromosome into C segments, we employ a similar Markov model as the one introduced in the SHAPEIT2 method [8]. It can be written as: (3)P(X1,X2∣H,R)=P(X{1}1,X{1}2∣H,R)∏s=2CP(X{s}1,X{s}2∣X{s−1}1.X{s−1}2,H,R) The idea here is to sample first a diplotype for the first segment s = 1 from P(X{1}1,X{1}2∣H,R) and then for each successive segment from P(X{s}1,X{s}2∣X{s−1}1,X{s−1}2,H,R). The scheme we use is described by the following steps: A pair of haplotypes in the first segment with labels (i, j) is sampled with probability proportional to P(X11=i,X12=j∣H,R).While s ≤ C a pair of haplotypes (d, f) for the sth segment is sampled given the previously sampled pair (i, j) for the {s–1}th segment with probability proportional to P(X{s}1=d,X{s}2=f∣X{s−1}1=i,X{s−1}2=j,H,R).Set s = s + 1.If s = C + 1 then stop, else go to Step 2.