Chunk #3 — 2 MODEL

Source: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
Embedded: yes

Text

We assume the data can be summarized into a table of counts, with rows corresponding to genes (or tags or exons or transcripts) and columns to samples. For RNA-seq experiments, these may be counts at the exon, transcript or gene-level. We model the data as negative binomial (NB) distributed, (1) for gene g and sample i. Here, Mi is the library size (total number of reads), ϕg is the dispersion and pgj is the relative abundance of gene g in experimental group j to which sample i belongs. We use the NB parameterization where the mean is μgi=Mipgj and variance is μgi (1+μgiϕg). For differential expression analysis, the parameters of interest are pgj.