Chunk #61 — Methods — Statistical filtering and analysis — Polymorphisms

Source: Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data.
Embedded: yes

Text

Beginning with substitution differences between the read and reference, we simply modeled the observed number of differences, X, at each reference position as a binomial, with the probability of substitution estimated from the data (mean substitution rate). Due to the large number of comparisons, the p-values were corrected using Holm's method [22]. The significance threshold was taken to be 0.05. A similar approach was adopted for detecting insertion and deletion polymorphisms, however, in the case of single-base insertions (over-calls of zero) the coverage of the site was taken to be the maximum coverage of the bases immediately adjacent to the insert. The probability of an insertion or deletion was also estimated from the data, but parameterised on the homopolymer length, as previous reports have shown that the error rate increases dramatically with homopolymer length [2].