Two steps of filtering were performed to remove both putative protein coding transcripts and their UTRs. First, large ORFs (>100 amino acids) were identified in all transcripts in all reading frames using EMBOSS getorf v6.1.0. In order to account for potentially truncated ORF-containing transcripts in which the start or stop codon may be outside the annotated region, the presence of greater than 300 nt downstream of a start codon without an interrupting stop codon, or 300 nt upstream of a stop codon without an interrupting start codon, sufficed to call a putative ORF. Transcripts with putative large ORFs were removed. These putative large ORF containing intergenic transcripts, some of which may be novel protein coding genes, are provided as a resource in Dataset S10. In order to remove potential UTRs of these large ORF-containing transcripts from the lincRNA catalog, the remaining transcripts were filtered to remove any that overlapped a large ORF-containing transcript.