Chunk #10 — Results

Source: Optimized splitting of mixed-species RNA sequencing data.
Embedded: yes
Text

As an example of the potential for misrepresentation of alignment counts, we analyzed the 50% mixture of reads aligned with a mixed human and mouse reference genome using HISAT2 (Table 1). By comparing the read IDs pre-tagged with source genome vs. the species of the aligned reference chromosome, the total numbers of correctly paired alignments could be counted, along with a percent misalignment (the fraction of total aligned read pairs assigned to the wrong genome). While this fraction was small (0.15% for human reads and 0.40% for mouse reads), it was not zero. Furthermore, the numbers of pairs aligned exceeded the input reads from each species, by 4.43% for hg38 and 0.40% for mm10. These mismatches were largely due to lower quality alignments, with most mismatches occurring with MAPQ of 1, while the majority of correct alignments had MAPQ of 60 (Supplemental Table 1). Many users would accept these lower values unless they filtered by MAPQ after running HISAT2 with default parameters. More concerning was the observation that summary counts for several individual genes had a substantial misassignment due to