Chunk #56 — Methods (full – for online materials) — RNA-seq mapping and transcript assembly

Source: An atlas of active enhancers across human cell types and tissues.
Embedded: yes

Text

Single-end 100bp long reads from libraries originating from the similar cell sources (all six “CD19+ B cells” libraries, all six “CD8+ T cells” libraries and one “Fetal heart” library) were processed together via the Moirai pipeline (Hasegawa et al., manuscript in preparation). The processing steps implemented within the Moirai pipeline included 1) raw sequenced reads PolyA tail and “CTGTAGGCACCATCAAT” adaptor clipping using FASTQ/A Clipper from FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/), 2) removal of sequenced reads containing “N” and sequences similar to ribosomal RNA using rRNAdust version 1.02 (Lassmann et al., manuscript in preparation), and 3) mapping the resulting reads against the hg19 human genome using TopHat46 (version 1.4.1) using both TopHat de novo junction finding mode and known exon-exon junctions extracted from GENCODE V10, with all the other parameters set to their default values. Mapped reads flagged as PCR duplicates were removed and the remaining TopHat aligned reads were then assembled using Cufflinks47 (version 1.3.0) with Cufflinks parameters set to their default values.