Chunk #15 — Results — Performance evaluation on Ion Proton testing benchmark

Source: Statistical modeling for sensitive detection of low-frequency single nucleotide variants.
Embedded: yes

Text

Next, for all distributions, we explored the performance profiles on different allele frequencies. As shown in Fig. 4, the well-separated F1 score levels clearly show that SNVs of lower allele frequencies are more difficult to identify, no matter what distributions were used. In addition, the significant separation of 0.5 % from the other allele frequencies indicate the detection limit is around 0.5 % under current sequencing platform and depth. Meanwhile, the power of appropriate modeling is evident when comparing the performances of all distributions on SNVs of 0.5 % allele frequency. Relative to Poisson GLM, considering either zero-inflation or dispersion boosted the F1 score by about 0.2 at 0.5 %, while considering both by ZINB further increased F1 score by about 0.1. Interestingly, compared with the second best model – NB GLM, both precision and recall increased in ZINB GLM, which pinpoints the necessity of modeling zero-inflation to derive more accurate error rates estimation. Furthermore, for SNVs with allele frequency greater than 1 %, the average recall is 97.5 % with 82.3 % average precision for ZINB GLM. To summarize,