Chunk #19 — Results — Application of ZINB PSEM on Illumina MiSeq data

Source: Statistical modeling for sensitive detection of low-frequency single nucleotide variants.
Embedded: yes

Text

To evaluate whether the differences in GLM coefficients affect the performance profiles on various allele frequencies, we applied the 4 GLM models trained on CAL_A to the other 3 calibration datasets and conducted the recall, precision and F1 score analyses by allele frequency on the combined dataset. As shown in Fig. 5, similar to the Ion Proton data set, SNVs of lower allele frequencies are more difficult to identify. However, when comparing the performances of ZIP with NB GLM on 0.5 % ~ 1 % allele frequency, different from Ion Proton dataset, NB demonstrated a much higher F1 score compared with ZIP. A closer look at the performance profiles shows the noticeable drop in recall comparing NB with ZIP in Ion Proton is absent in MiSeq data. Examination on the benchmark SNVs missed by NB but recovered by ZIP showed lower depth for the missed ones. While the absent of recall drop in MiSeq is due to its relatively even depth contrast to the Ion Proton dataset (Fig. 2). For SNVs with > 1 % allele frequency, the F1 scores are all greater than 0.9 and clustered together for all distributions.