Variants in the tumor are identified by analyzing the data at each site under two alternative models: (i) a reference model, M0, which assumes there is no variant at the site and any observed non-reference bases are due to random sequencing errors; and (ii) a variant model, Mfm, which assumes the site contains a true variant allele m at allele fraction f in addition to sequencing errors. The allele fraction f is unknown and is estimated as the fraction of tumor reads that support m. This explicit modeling of f instead of assuming a heterozygous, diploid event makes our method more sensitive than other methods21,22. We declare m to be a candidate variant if the log-likelihood ratio of the data under the variant and reference models (that is, the LOD score (log odds)) exceeds a predefined decision threshold that depends on the expected mutation frequency and the desired false positive rate (Online Methods). The choice of decision threshold can be used to control the tradeoff between specificity and sensitivity, as described by a Receiver Operating Characteristic (ROC) curve (Fig. 2a,