Chunk #12 — 1 INTRODUCTION — 1.3 Related work

Source: SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors.
Embedded: yes

Text

The statistical models we propose in this contribution provide posterior probabilities on SNV predictions, removing the need for depth thresholds and use an EM learning algorithm to fit the model to data removing the need to set model parameters by hand. We also show how to explicitly model base and mapping qualities, and explore how quality thresholds can be used in combination with probabilistic weighting. We show that these attributes of the model result in increased accuracy compared with Maq's SNV caller and depth threshold-based methods. We evaluate the model based on real data derived from 16 ovarian cancer transcriptomes sequenced using NGS, and a lobular breast cancer genome sequenced to >40x coverage (Shah et al., 2009b). For all cases, we obtained high-density genotyping array data for orthogonal comparisons. Finally, we demonstrate results on 497 positions from the breast cancer genome that were subjected to Sanger sequencing and thus constitute a ‘ground truth’ dataset for benchmarking.