Chunk #35 — Discussion

Source: Optimized splitting of mixed-species RNA sequencing data.
Embedded: yes

Text

The overall performance of alignment-based method, however, is highly dependent on the quality of the sequencing data, and could be less effective under conditions where poor sequencing quality leads to lower alignment rate. Therefore, we also evaluated alignment-independent methods, HMM and CNN, by which reads can be separated based on features within the sequences without pre-aligning to reference genomes. HMM did not adequately separate reads by species. However, CNN could be applied for sequencing datasets from organisms whose genomes are not well annotated. Importantly, CNN provides better and faster classification of RNAseq reads of two species compared with HMM (Fig. 2). Note that our evaluation of these methods might be limited by our choice of cell types, although these are the primary cell types appropriate for our intended application. However, we suspect the suboptimal performance of such probabilistic models is due to the high similarity between the linear sequence from human and mouse genome.