Many mutation detection methods have been developed, but there are few systematic approaches for benchmarking their performance on real sequencing data. Previous publications described simulation methods ranging from fully synthetic models21 to ones that better capture real sequencing errors11. However, none of these methods model the full diversity of non-random sequencing errors of both the reference and alternate alleles at the genomic site. To better evaluate the performance of mutation detection methods, we have used two benchmarking approaches, down-sampling and virtual tumors.