Chunk #27 — Results — Alignment-Independent Methods

Source: Optimized splitting of mixed-species RNA sequencing data.
Embedded: yes

Text

We consider this process as assigning labels of either human or mouse to strings of characters. The key step is choosing the sets of features which will then input into the classification algorithms. Features like sequence length are highly dependent on the size selection step during library preparation so input strings must be truncated to a constant length (Supplemental Fig. 1A).59 Other sequence features, for example GC content, may have specificity for human genome.60 However, the difference in GC content between these two species (~41% in human and ~42% in mouse for non-sex chromosomes; Supplemental Fig. 1B) is insufficient to aid the separation.61