Chunk #46 — Materials and Methods — LincRNA Discovery — Overlapping transcripts passing all filters at each expression cutoff were merged and grouped by proximity

Source: Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs.
Embedded: yes

Text

To identify a minimal set of distinct lincRNAs, overlapping transcripts were merged if 50% of an exon of a transcript overlapped an exon of another transcript. Furthermore, merged transcripts within 1 kb of each other were placed in the same group but received distinct transcript numbers, and are named based on the FPKM expression level they were derived from, e.g. FPKM1_group_32871_transcript_1. Merging, grouping and naming was performed separately on all FPKM>1 transcripts, FPKM>10 transcripts, and FPKM>30 transcripts. Filtering statistics are presented in Table S3. The catalog of merged lincRNAs at each expression cutoff is in BED format for genome build hg18 in Dataset S2. The FPKM>1 catalog of lincRNAs was used for all analyses in this study unless stated otherwise. The lincRNA annotations are provided as BED files in the hg18 genome annotation rather than hg19 because the UCSC Genome Browser currently has more data “tracks” available for hg18. However, the lincRNA annotations may be readily converted to hg19 or other genome annotations by users with the LiftOver tool: http://genome.ucsc.edu/cgi-bin/hgLiftOver.