paperKB
coga / coga-kb
Help
Sign in

Chunk #4 — Background

Source
Statistical modeling for sensitive detection of low-frequency single nucleotide variants.
Embedded
yes

Text

By analyzing previous efforts, our group proposed a framework (Yangyang Hao XX, Li L, Nakshatri H, Edenberg HJ, Liu Y. RareVar: A Framework for Detecting Low Frequency Single Nucleotide Variants, submitted) to first generated position specific error model (PSEM) using genome sequence contexts for candidate SNV identification and then apply a machine-learning model to refine the candidates. Testing on an Ion Proton benchmark dataset, our framework outperforms existing methods, especially at 0.5 % to 3 % frequencies. However, the potential to improve PSEM performances on SNVs with close to sequencing error rates by implementing more sophisticated statistical modeling and the generalizability and adaptiveness of PSEM remain untested. In this research, we explored what distributions fit the DNA-Seq data error rates modeling as well as the possibility of improved position specific error rate prediction for higher precision and recall on SNVs down to 0.5 % frequency. Further, we evaluated how different sequencing technologies affect the behavior of PSEM and the generalizability and adaptiveness of the PSEM framework.