Chunk #39 — Results and discussion — Comparing bias across libraries — Error biases

Source: Characterizing and measuring bias in sequence data.
Embedded: yes

Text

While coverage bias is an important sequencing metric, it ignores possible variations in sequence accuracy. For many applications, decreases in accuracy could offset the advantages of better relative coverage in difficult regions. To compare between platforms and assess the influence of sequence context, Figure 4 plots the mismatch, deletion, and insertion rates on P. falciparum, R. sphaeroides, and human for the four surveyed technologies, as a function of GC content, whereas Figure 5 plots the same as a function of homopolymer length. A logarithmic scale is used to facilitate comparison between technologies and between error types because rates vary greatly. Table 5 lists the genome-wide error rates for the four platforms. For human, the reported errors include bona fide differences between the NA12878 sample and the reference sequence, and hence the error rates were somewhat inflated. When Illumina NA12878 data (data set 14) were aligned to an NA12878-specific reference [37], the mismatch rate declined by 40%, and the indel rate declined by 80% (Table S2 in Additional file 3). Because of their larger magnitude, a similar experiment yielded no substantial change in the Ion Torrent error rates.