paperKB
coga / coga-kb
Help
Sign in

Chunk #2 — Background

Source
GemSIM: general, error-model based simulator of next-generation sequencing data.
Embedded
yes

Text

There is growing evidence that sequence context (i.e., the nucleotide sequence surrounding a base and the base's position within the read) influences error rates in both Roche/454 and Illumina sequencing [8,9]. This awareness has led to more advanced simulators such as MetaSim and Flowsim [10,11]. While MetaSim generates reads from many input genomes and uses sequence-context error models, it cannot be trained on real data and does not assign quality values to reads, limiting its potential applications. The recent program Flowsim is the most realistic NGS simulator to date, with advanced error modelling and quality scores [11]. However it operates only in 'flowspace' and is therefore entirely limited to simulation of Roche/454 pyrosequencing data. Likewise, the unpublished simulator SimSeq [12] empirically captures some characteristic features of Illumina error models, however only allows a single input genome, does not empirically derive all parameters, and cannot simulate Roche/454 data. ART [13], an unpublished cross-platform simulator, also uses context-dependent error models and does assign quality scores. However it appears limited to a single genome and does not allow training on user's own data