paperKB
coga / coga-kb
Help
Sign in

Chunk #34 — Methods — Expression analysis using RNA-seq data

Source
Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes.
Embedded
yes

Text

We used RNA-seq data from Gm12878, H1-hESC, K562 and HUVEC cell lines to check the model performance, when expression levels of lncRNAs and protein-coding genes are similar. We used the mappings, provided by ENCODE (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/) and we quantified the expression levels as RPKM (read per kilobase of exon per million mapped reads) [79] using FluxCapacitor [80]. We excluded all the transcripts having RPKM = 0. To identify the lncRNA and protein-coding genes with similar expression distribution, for each lncRNA we selected a protein-coding gene with the nearest expression value (but not differing more than 1% of its expression level) (Text S1 Methods section). In this way we secured a one-to-one correspondence between lncRNA genes and protein-coding genes matching based on their expression level, thus avoiding any kind of possible expression bias between lncRNA and protein-coding genes (Figure S6, Dataset S3).