Chunk #57 — Methods — Read preparation and alignment

Source: Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data.
Embedded: yes

Text

Each read was aligned to its respective reference genome using Segemehl version 0.1.2 [20], an InDel tolerant short-read aligner. The alignment output was parsed and the aligned regions extracted from both the read and reference RLEs. This was necessary as both substitutions and identical bases are reported as matches (‘M’) in the SAM format generated by this version of Segemehl. Using the SeqOp string and the aligned sequence segments, each position in the alignment was recorded as a match, substitution, insertion or deletion. Using a set of in-house Perl scripts and SQL databases, the alignment positions were mapped to their relevant flow-value, base position/s and quality scores. Read level attributes such as average quality, average G+C% in 100 bp windows and read length were also calculated. To avoid over-inflation of the error rate due to (1) 5′ misalignments and (2) homopolymers which overlapped the last base of the key and the first base/s of the read, we only consider base positions 10 and greater for analysis. Furthermore, analysis was restricted to the first 100 bp for the Ion OneTouch Template