Chunk #48 — Online Methods — 1. Data matrix, primary analysis and processing, quality control — 1.1 RNA-seq uniform processing and quantification for consolidated epigenomes

Source: Integrative analysis of 111 reference human epigenomes.
Embedded: yes

Text

We uniformly reprocessed mRNA-seq datasets from 56 reference epigenomes that had RNA-seq data. For RNA-seq analysis, after library construction44, we aligned 75bp or 100bp long reads using the BWA aligner, and generated read coverage profiles separately for positive and negative strand strand-specific libraries. We used several QC metrics for the RNA-seq library, including intron-exon ratio, intergenic reads fraction, strand specificity (for stranded RNA-seq protocols), 3’-5’ bias, GC bias, and RPKM discovery rate (Table S1, RNAseqQCSummary sheet). We quantified exon and gene expression using a modified RPKM measure8, whereby we used the total number of reads aligned into coding exons for the normalization factor in RPKM calculations, and excluded reads from the mitochondrial genome, reads falling into genes coding for ribosomal proteins, and reads falling into top 0.5% expressed exons. RPKM for a gene was calculated using the total number of reads aligned into all merged exons for a gene normalized by total exonic length. The resulting files contain RPKM values for all annotated exons and coding and non-coding genes (excluding ribosomal genes), as well as introns (Gencode V10 annotations were used). We also report the coordinates of all significant intergenic RNA-seq contigs not overlapping the annotated genes.