Chunk #7 — 1 INTRODUCTION — 1.3 Related work

Source: SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors.
Embedded: yes

Text

A simple way to detect SNV locations would be to compute the fraction , and then to call as SNVs those locations where fi is below some threshold. In the example in Figure 1A, applying threshold of would successfully discard all columns (including the two columns which have singleton non-reference reads, which may be due to base-calling errors), except the two containing the SNVs. A critical flaw with this approach is that it ignores the confidence we have in our estimate of fi. Intuitively, we can trust our estimate more at locations with greater depth (larger Ni). This idea has been applied by Morin et al. (2008), wherein read depth thresholds of Ni ≥ 6 and bi ≥ 2 reads supporting the variant allele were applied, with an additional requirement that the non-reference allele must be represented by at least 33% of all reads at that site. This should eliminate SNVs with weak supporting evidence, but it categorizes the data into two discrete classes—SNV or not, without explicitly providing confidence estimates on the prediction. Moreover, in transcriptome data, the number