paperKB
coga / coga-kb
Help
Sign in

Chunk #19 — 2 METHODS — 2.4 Modeling base and mapping qualities

Source
SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors.
Embedded
yes

Text

The model shown in Figure 1C assumes that aij is observed (it is a shaded node in the graph), and thus assumed correct. However, each nucleotide's contribution to the allelic counts has uncertainty associated with it in the form of base and mapping qualities. We propose a soft (or probabilistic) weighting scheme, which will down-weight the influence of low-quality base and mapping calls, but not discard them altogether. To model this, we change aij to be an unobserved quantity as shown in Figure 1D, and instead observe the soft evidence on them in the form of probabilities, which we represent by the observed base qualities qij∈[0, 1]. Similarly, we introduce unobserved binary random variables zij∈{0, 1} representing whether read j is correctly aligned, and soft evidence in the form of probabilities which we represent by the observed mapping qualities rij∈[0, 1]. The conditional probability distributions for p(qij|aij, zij) and p(rij|zij) are given in Figure 3. Thus, the input data is now q1:T, r1:T and the corresponding likelihood for each location i can be obtained by marginalizing out a, z as follows: (5) (6) (7)