Chunk #43 — Results and discussion — Discovering uncategorized bias

Source: Characterizing and measuring bias in sequence data.
Embedded: yes

Text

Finally, with the goal of understanding bias in the human genome that was not explained by our motifs, we generated >100-fold coverage of NA12878 using Illumina HiSeq data, from libraries generated with Kapa Biosystems reagents (Table 2, data set 14). We note that some apparently low or missing coverage will be due to true biological differences, including sequences that are present in the reference but not in NA12878. However, we used other deeply sequenced data sets and an assembly-based analysis to filter out many of these variant loci, as described below.