Chunk #9 — Results — Discovery of a Large Number of Novel LincRNAs

Source: Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs.
Embedded: yes

Text

We used this large set of RNA-seq data in combination with previous noncoding RNA annotation sets to generate the most comprehensive catalog of lincRNAs (Figure 2A). In order to generate this lincRNA catalog, we first compiled known and putative annotated lincRNAs. We collected noncoding RNAs present in public databases, including GENCODE v6, and from literature sources [16], [18] resulting in a set of 351,940 transcripts. In addition, we performed de novo transcriptome assembly on each of the RNA-seq datasets (Table S2) to generate 6,833,809 de novo assembled transcripts. Both previously annotated and de novo assembled transcripts were filtered to remove transcripts overlapping protein coding genes, known non-lincRNA noncoding RNA genes, and pseudogenes. Transcripts longer than 200 nucleotides were further filtered to remove any transcripts containing (or overlapping any other transcript containing) an open reading frame (ORF) longer than 100 amino acids. Out of concern that some de novo assembled transcripts may be unannotated extensions of neighboring protein coding genes, as was recently observed for a fraction of GENCODE long noncoding RNAs [19], we created an additional filter to remove transcripts