Chunk #38 — Results and discussion — Gene-level analysis

Source: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.
Embedded: yes

Text

We here present DESeq2 for the analysis of per-gene counts, i.e., the total number of reads that can be uniquely assigned to a gene. In contrast, several algorithms [28,29] work with probabilistic assignments of reads to transcripts, where multiple, overlapping transcripts can originate from each gene. It has been noted that the total read count approach can result in false detection of differential expression when in fact only transcript isoform lengths change, and even in a wrong sign of LFCs in extreme cases [28]. However, in our benchmark, discussed in the following section, we found that LFC sign disagreements between total read count and probabilistic-assignment-based methods were rare for genes that were differentially expressed according to either method (Additional file 1: Figure S5). Furthermore, if estimates for average transcript length are available for the conditions, these can be incorporated into the DESeq2 framework as gene- and sample-specific normalization factors. In addition, the approach used in DESeq2 can be extended to isoform-specific analysis, either through generalized linear modeling at the exon level with a gene-specific mean as in the DEXSeq package