paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #48 — Discussion

Source
GENCODE: the reference human genome annotation for The ENCODE Project.
Embedded
yes

Text

Interestingly, there is still uncertainty about the number of protein-coding and long noncoding loci in the genome. Assessing how many protein-coding loci are missing from the catalog is difficult, but our analysis of coding potential using conservation indicates that the number is likely to be small, namely, around 100 protein-coding genes. A similar figure was suggest by Lindblad-Toh et al. (2011), who recently reported the sequencing and comparative analysis of 29 eutherian mammals. However the recent publication by Ingolia et al. (2011) suggests that there is a new class of small “polycistronic” ribosome-associated coding RNAs encoding small proteins that can now be detected using ribosome profiling. They highlight that the majority of predicted lncRNAs in the mouse from Guttman et al. (2009) actually show comparable translatability to that of protein-coding genes. In addition, Cabili et al. (2011) have found 2798 lincRNAs not in GENCODE 4 using a combination of HBM RNA-seq and additional RNA-seq from eight additional cell lines and tissues totaling 4 billion reads. This indicates that there are still many thousands of lncRNA loci to add to the GENCODE catalog, and completeness will be dependent on the depth and variety of tissues and cell lines sequenced.