Chunk #10 — Results — High frequency polymorphisms between sequenced reads and reference genomes

Source: Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data.
Embedded: yes

Text

Prior to assessing PGM error profiles, we determined if there were any genuine polymorphisms between the PGM determined and reference genomes, which may be the result of accumulated mutations in the genome [4] or sequencing errors in the original genome project. We tested each base difference between the reads and their respective reference genome to identify whether the number of observed differences was significantly higher than the expected error rate (see Methods ). Across all datasets, there were a large number of significant differences, predominantly high-frequency insertion and deletion (indel) polymorphisms ( Table 2 ). While the number of polymorphisms appear to be lower for 100 bp OneTouch kits, this is likely due to lower coverage reducing the sensitivity of our ‘polymorphism’ detection. It has been previously reported that the majority of indel polymorphisms detected in PGM reads are false-positives, even when the ‘putative’ indel was present across a large number of reads [3], [5]. We would expect that if the indels in our datasets were bona fide polymorphisms they would be observed across all datasets for the same species.