Chunk #25 — Methods — The phasing model for low coverage sequence data

Source: Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.
Embedded: yes

Text

We use the SHAPEIT2 model for the terms P(X{1}1,X{1}2∣H) and P(X{s}1,X{s}2,X{s−1}1,X{s−1}2∣H) We do not give more details here since a complete description can be found in the SHAPEIT2 paper [8]. The genotype likelihoods enter the model in the term P (R|X1, X2) as a product over all L sites as P(R∣X1,X2)=∏l=1LP(R∣Gl=AlXl1+AlXl2) which implies that P(R∣X{1}1,X{1}2)=∏l=b1e1P(R∣Xl1,Xl2) P(R∣X{s}1,X{s}2,X{s−1}1,X{s−1}2)=∏l=bs−1esP(R∣Xl1,Xl2)