Chunk #90 — Materials and methods — Filtering NA12878 data for the discovery of uncharacterized bias

Source: Characterizing and measuring bias in sequence data.
Embedded: yes

Text

to HG19: 152-fold from HiSeq v2 chemistry (the Phusion, Phusion + betaine, and AccuPrime data discussed previously, data sets 10 to 12), 110-fold from version 3 chemistry using low-input Fisher et al. library construction (data set 13 with four additional lanes from data set A1), and 120-fold from version 3 chemistry with Kapa-based library construction (the previously discussed 'Kapa' data set 14). Any reference base with less than 0.1 relative coverage in all three NA12878 data sets was considered 'undercovered'. This was a subset of the bases that are undercovered in the HiSeq 'Kapa' data: if a base was not undercovered in one of the other two data sets, then we assumed that its bad performance in the 'Kapa' data might be due to technology rather than biology.