Chunk #5 — 3. DETECTION OF TECHNICAL BIASES IN RNA-SEQ DATA

Source: RNA-Seq reveals novel transcriptional reorganization in human alcoholic brain.
Embedded: yes

Text

Obtaining an accurate assessment of RNA molecules that correspond with disease is not a trivial undertaking and should include a comprehensive evaluation of expression estimates for potential areas of artificial biases (Ozsolak & Milos, 2011). Transcript length and guanine–cytosine content (GC content) are two particular characteristics that may influence the quantification of RNA-Seq data (Oshlack & Wakefield, 2009; Pickrell et al., 2010). Nonnormalized expression counts follow a similar trend for alcoholics and matched controls with respect to the length (Fig. 11.5A) and percentage of GC content of identified transcripts (Fig. 11.6A). The length and GC content for mapped features, without normalization, are significantly associated with expression for both groups (Figs. 11.5B and 11.6B). Correcting expression estimates based on the number of collected reads per kilobase per million (RPKM) mapped reads, one method accounting for molar concentration and transcript length (Mortazavi, Williams, McCue, Schaeffer, & Wold, 2008), effectively alleviated the significant bias introduced by transcript length within controls and alcoholics (Fig. 11.5D). Utilizing RPKM values also blunted the relationship between GC content and computed expression values (Fig. 11.6D), although not to