paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #40 — Assessing coding potential of RNA-seq models using PhyloCSF

Source
GENCODE: the reference human genome annotation for The ENCODE Project.
Embedded
yes

Text

We analyzed the resulting transcripts that did not overlap any GENCODE loci for coding potential using PhyloCSF (Lin et al. 2011), which examines evolutionary signatures within UCSC vertebrate alignments, including 33 placental mammals. There were 136 Ensembl HBM models with positive PhyloCSF scores out of a total of 3689 loci, although only five of these had sufficient support for manual reannotation as coding genes (see Supplemental Table 8). The remaining 131 transcripts showed varying quality and evidence; ∼50% overlap novel processed transcripts and could be a result of misalignment of reads or actual expressed pseudogenes. Two hundred Scripture transcript predictions that were outside GENCODE but had high PhyloCSF scores were also manually examined. Of these, 15 were added as novel loci, and only nine were annotated as coding genes (see Supplemental Table 9) and will be added to the next release of GENCODE). Considering the depth of reads of the HBM data (averaging over a billion read depth) from the 16 different tissues, we have not identified many missing coding genes based on PhyloCSF. Indeed, since 3127 HBM Ensembl genes