Chunk #22 — Results — Alignment-Dependent Methods — Separating FASTQ Reads by Primary Alignment Flag

Source: Optimized splitting of mixed-species RNA sequencing data.
Embedded: yes

Text

During evaluation of the byAS method, we realized that the alignment score calculated by HISAT2 exhibits a narrow range of values so there is little resolution between the best and worst alignments. Furthermore, the number of reads with identical alignment scores (~0.51%), while small, is another source of inaccuracy. Another element of the SAM file format is a flag to denote the “primary” alignment, with lesser quality alignments considered as secondary (marked with a “true” SAM flag at 0×100, or decimal 256, which denotes “not primary alignment”). Since this flag, when set as “false,” ought to be found on the read with the best alignment score, it appeared to provide a simpler strategy for classifying alignments by species. We designed a second Python script (byPrim.py) to split FASTQ files based on the primary alignment flag in a mixed-genome SAM file.