Chunk #54 — Conclusions

Source: Characterizing and measuring bias in sequence data.
Embedded: yes

Text

Sequencing vendors and individual investigators alike strive to improve the quality of their data. This includes increasing read length, yield, overall base quality, and other average measures that reflect the behavior of the technology on 'typical' parts of the genome. However, such measures do not tell us how the technology performs on the 'hardest' parts of the genome, where data quality is lowest, and this is a critical omission. For example, as we have noted, in many human data sets there are large numbers of transcription start sites and first exons with essentially no coverage, and although this bias affects only a tiny fraction of the genome, it is of fundamental importance to the utility of the data.