The targeted region for Ion Proton datasets included all exons of 409 known cancer-related genes, totaling about 1.7 million bases covered by about 16,000 amplicon primer pairs from Ion AmpliSeq™ Comprehensive Cancer Panel. The training benchmark is the DNA sequencing data of NA11993. The testing benchmark mimic the paired normal-tumor design, where the normal sample is the DNA sequencing data of NA12878 while tumor sample is a mixture of 17 individuals from 1000 Genomics Project plus NA12878. The mixing percentage assignment is listed in Additional file 2. The sequencing data were aligned with TMAP from Torrent Suite software. Reads with mapping quality less than 40 were filtered out.