Chunk #9 — Results and discussion — Per-base bias

Source: Characterizing and measuring bias in sequence data.
Embedded: yes

Text

Some reads cannot be mapped to a single locus, and the probability of ambiguous mapping increases as reads become shorter or less accurate. Ambiguous mapping is also more likely for reads that derive from repetitive or low complexity regions of the genome, including some regions with extreme GC content. To solve this problem, we rely on the aligner employing a policy of random assignment when there are multiple 'best' alignments. This provides the optimal measurement of coverage bias given the data: it is impossible to know whether specific locations are evenly represented, but we can nonetheless expect to accurately assess the coverage of classes of bases as defined by some local sequence context (for example, involving GC content, and so on). All the alignment algorithms used in this work (see Materials and methods) use this random-placement policy.