Chunk #12 — Results and discussion — Motif bias

Source: Characterizing and measuring bias in sequence data.
Embedded: yes

Text

approached this problem in two ways: by evaluating specific biologically important regions of the genome that are significantly undercovered, and by identifying specific sequence motifs that are systematically undercovered. Anecdotal results suggested that many transcription start sites or first exons in the human genome tend to have poor coverage. By a systematic analysis of these regions we defined the 1,000 with the lowest relative coverage based on low coverage by an Illumina data set, which we term the 'bad promoters' list (see Materials and methods). The bad promoters are, like many exons, GC-rich (averaging 79% GC composition).