The virtual tumor approach begins with deep-coverage data from a high coverage whole-genome sample (NA12878) sequenced on Illumina HiSeq instruments by the 1000 Genomes Project42 (2 libraries, “Solexa-18483” and “Solexa-18484”, at 30x each) and Gnerre et al.43 (1 library, “Solexa-23661”, at 30x). These data are publicly available – details are in Supplementary Table 5.