Chunk #3 — Introduction

Source: Optimized splitting of mixed-species RNA sequencing data.
Embedded: yes
Text

Several strategies have been used for processing mixed-species RNAseq data. Regression-based methods decompose expression data from a mixed cell population, but these depend on an estimated reference gene expression profile from purified cell populations using cell sorting.25, 26 Other methods require a pre-captured reference profile from single-cell expression data to estimate the composition of cell mixtures,27–29 which is not always feasible.30 Here we compare methods which directly classify bulk RNAseq reads into human or mouse. The goal is to perform bulk RNAseq using a mixed cell population of both human and mouse and distinguish sequencing reads of each species without a pre-captured reference profile or single-cell sequencing. Our alignment-based approach maps raw sequencing data to pooled reference genomes and compares strategies for choosing the best alignment to classify reads based on species. An alignment-free approach employs a one-dimensional convolutional neuronal network which is commonly used in image classification and text matching problems.31, 32 We demonstrate the performance using different sets of mixed RNAseq data of pre-acquired purified human and mouse cells. We found the alignment-free method is accurate when assessing