paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #15 — Results — Coverage and G+C bias

Source
Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data.
Embedded
yes

Text

amyloliquefaciens is clear even when restricting to this 20–50% G+C range. This suggests that the relationship between G+C% and coverage is influenced by the DNA template from which the sequences are derived. We fitted a linear model to the normalised coverage to evaluate the significance and magnitude of the relationship between G+C% and coverage, as well as the influence of species (DNA template) on this relationship (see Methods ). All terms in the regression were significant (p<0.0001), and inspection of the linear model diagnostic plots revealed no dramatic deviations from normal model assumptions (Figure S5). This model can be split into two linear regressions, one for B. amyloliquefaciens,and the second for S. tokodaii,where is the normalised coverage and , the proportion G+C% in a 100 bp window. The B. amyloliquefaciens model describes a small negative effect for increasing G+C% on coverage, whereas S. tokodaii has a larger, positive effect. This relationship requires further investigation with a wider range of species, but it is replicated across the various kits, chips and machines used in this study. While we had originally intended to include the high G+C% organism, Deinococcus maricopensis ( Methods ), the inconsistency in read throughput across chips for this