Chunk #30 — Results and discussion — Comparing bias across technologies — Bias on human samples

Source: Characterizing and measuring bias in sequence data.
Embedded: yes

Text

Table 2 and Figure 3 show the motif results and bias curves comparing Illumina HiSeq (data set 14), Ion Torrent PGM (data set 15), and Complete Genomics (data set 16) coverage of NA12878. The HiSeq libraries were prepared using the low-input Fisher et al. protocol [31] modified with Kapa Biosystems reagents (see Materials and methods), the other libraries used the manufacturers' standard protocols (see Materials and methods). We use data set 14 to represent HiSeq performance, rather than the other HiSeq human data sets in Table 2, because it represents our current best Illumina library construction protocol. Of the data sets tested, the bias curves clearly suggest that the Illumina HiSeq data provided the most even coverage of the human genome. Complete Genomics coverage dropped more severely at both GC extremes and only provided 0.092 relative coverage of the bad promoters, compared to 0.36 relative coverage by HiSeq. The Ion Torrent coverage dropped even more quickly than Complete Genomics as GC increased and only provided 0.046 relative coverage of the bad promoters. Ion Torrent also had the worst performance of these three data sets on the (AT)15 and G|C ≥ 80% motifs.