Chunk #31 — Results — Alignment-Independent Methods

Source: Optimized splitting of mixed-species RNA sequencing data.
Embedded: yes

Text

human reads outperformed models trained by other training sets. With 90% or more human reads in the testing sets, models trained by 90% and 100% human reads performed similarly. The total accuracy positively correlated with the ratio in testing sets. To further determine the conceptual training strategy with different ratios of human reads in the testing sets, we compared accuracies with multiple possible testing datasets. The performance of models tested with 0, 10% and 50% human reads decrease along with decreased human reads in training sets (Fig. 3B). However, the accuracies dropped at 50% or more human reads in the training sets. This intuitively makes sense since more human reads in the training data would have emphasized the features of human rather than mouse which leads to the worse performance with lower human reads in the testing. The accuracy from testing sets with a higher percentage of human reads, on the other hand, jumped significantly after 10% then increased along with increased human reads in the training sets. Thus, we propose that to optimize performance of the model, the training sets should contain equal number of reads from both species.