paperKB
coga / coga-kb
Help
Sign in

Chunk #41 — Results and discussion — Comparing bias across libraries — Error biases

Source
Characterizing and measuring bias in sequence data.
Embedded
yes

Text

insertion rates across the GC spectrum, but the deletion rate rose substantially at the extremes. Within long homopolymers, the behavior of insertion and deletion errors would depend on whether a technology systematically over- or under-reports homopolymer length. For example, as homopolymer lengths increased, Ion Torrent showed an increased deletion rate, but the insertion rate stayed about the same. In contrast, the insertion and deletion rates of Illumina data increased in longer homopolymers, which is consistent with their behavior in GC-extreme regions. In the Illumina and Ion Torrent human data, these trends were unchanged when the data were realigned to a sample-specific reference [37] that accounted for known biological variations (Figure S1 in Additional file 4, Figure S2 in Additional file 5). Similarly consistent with GC behavior are the decrease in insertions and increase in deletions observed in Pacific Biosciences data in long homopolymers. In general, the sequence-context dependence of error rates varied considerably from technology to technology.