Chunk #26 — 2 METHODS — 2.8 Benchmarking experiments

Source: SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors.
Embedded: yes

Text

To evaluate the effect of estimating parameters, we designed a 4-fold cross-validation study. We permuted the 144 271 positions with matched array-based genotype data from the ovarian cancer data, and divided the positions into four equal parts. We fit the model to three parts (training data) using EM and used the converged parameters to calculate p(SNVi) for each of the remaining positions (test data). We repeated this 10 times and computed the AUC for each of the 16 cases. We also computed AUC from the results predicted by Maq v0.6.8 and compared the AUC distributions across the 16 cases to SNVMix1 and SNVMix2. These data also allowed us to determine the range of converged parameter estimates across the folds and 10 replicates. We also tested the effect of depth-based thresholding by running SNVMix1 on the 14 649 positions from the breast cancer genome. To simulate the thresholding, we set p(SNVi)=0 at locations where Ni was below some threshold, chosen from the set {0, 1,…, 7, 10}. We compared SNVMix1, SNVMix2 and Maq on this data as well. Finally, we evaluated