Chunk #11 — Results — Discovery of a Large Number of Novel LincRNAs

Source: Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs.
Embedded: yes

Text

In a final step, we removed transcripts expressed at fragments per kilobase of transcript per million mapped reads (FPKM)<1, a threshold approximately equivalent to one copy per cell [21] (Table S1). To decrease redundancy, and with the goal of identifying lincRNA “genes” rather than potentially redundant overlapping “transcripts”, the remaining transcripts were merged if they shared at least one exon (see Methods) resulting in 53,864 distinct putative lincRNAs at FPKM>1, 3,676 lincRNAs at FPKM>10, and 925 lincRNAs at FPKM>30 (Dataset S2 and Figure S3). Surprisingly, greater than 94% of the final set of merged lincRNAs at each expression level consists exclusively of novel de novo assembled transcripts discovered from the RNA-seq data in this study (Table S3 and Dataset S2). Rather than being clustered near currently annotated genes, these lincRNAs are spread throughout intergenic sequence. 58.1% of FPKM>1 lincRNAs, 61.9% of FPKM>10 lincRNAs, and 67.7% of FPKM>30 lincRNAs are greater than 30 kilobases from the nearest protein coding gene on either strand. We annotated the lincRNAs as belonging to the same “group” (see Methods) if they are within 1 kilobase