Chunk #24 — 2 METHODS — 2.6 Datasets

Source: SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors.
Embedded: yes

Text

positions. These data were used in the cross-validation experiment, described below. The second dataset (Supplementary Dataset 2A–D) consisted of 497 positions from a lobular breast tumor genome predicted as SNVs using SNVMix1 model from data generated using the Illumina GA II platform. These positions were predicted to be non-synonymous protein-coding changes and were subsequently sequenced using Sanger capillary-based technology. Of these, 305 were confirmed as SNVs and 192 were not confirmed. These 497 positions were considered as the ground truth dataset used for sensitivity and specificity calculations. In addition, we also generated Affymetrix SNP 6.0 array data for this case and considered 14 649 positions (Supplementary Dataset 3A–D) that matched the coding positions and CRLMM prediction criteria outlined above. All NGS data were aligned to the human genome reference (NCBI build 36.1) using Maq's map tool (v0.6.8). Thus, for all comparisons between Maq and SNVMix, we used the same baseline set of aligned data.