paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #32 — Methods — Using a haplotype scaffold

Source
Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.
Embedded
yes

Text

We denote as F the pair of haplotypes derived from SNP array for the ith individual, now the goal is to sample a pair of haplotypes from P (X1, X2|H, R, F) such that they are fully consistent with F. The scaffold F imposes a set of hard constraints on the space of possible haplotypes generated by the sampling scheme as illustrated in Supplementary Figure 3c. So in the first segment s = 1: P(X{1}1,X{1}2∣H,R,F)=P(X{1}1,X{1}2∣H,R) when the pair of haplotypes defined by (X{1}1,X{1}2) is fully consistent with F over the first segment, and 0 otherwise. Similarly, we define P(X{s}1,X{s}2∣X{s−1}1,X{s−1}2,H,R,F)=P(X{s}1,X{s}2∣X{s−1}1,X{s−1}2,H,R) when the haplotype pair defined by (X{s}1,X{s}2,X{s−1}1,X{s−1}2) is fully consistent with F over the segments s and s–1, and 0 otherwise. In practice, setting some of the transition probabilities that are inconsistent with F to 0 between successive segments means that it becomes impossible to sample haplotypes inconsistent with F across the full set of L sites.