Chunk #48 — Materials and methods — Mutation detection procedure — Step 1: error rate estimation

Source: Detection of low prevalence somatic mutations in solid tumors with ultra-deep targeted sequencing.
Embedded: yes

Text

12 possibilities when considering strands separately), which are usual predictors of error in sequencing by synthesis [6]. The retained positions were stratified into bins corresponding to the substitution type (reference to mismatch substitution), the read position, and the strand. An error rate was estimated by grouping all the observations from a specified bin. Grouping is critical given that the variation in read depth is variable among the observations. In order to correct for the noise from the low number of observations in some cases, the error rate was smoothed borrowing information from nearby read positions to decrease the noise in the error rate estimation. Smoothing was accomplished within each bin stratified by the reference allele, the alternative allele and the strand. Given a read position k, if the sum of read depth is less than the average depth across all bins, then the estimated error rates from read positions of k+1, k-1, k+2, k-2, and so on, were used to smooth the estimation until the sum of read depth exceeded the average depth across all bins. The contributions were weighted by the distance to k as well as by the read depth. The smoothed error rate was computed as E