paperKB
coga / coga-kb
Help
Sign in

Chunk #26 — Assessing the completeness of transcript structures in the GENCODE 7 set

Source
GENCODE: the reference human genome annotation for The ENCODE Project.
Embedded
yes

Text

To assess the consistency in the structure of annotated gene models between releases, the number of exons per transcript was plotted for all splice variants at protein-coding and lncRNA loci in releases 3c and 7 (see Fig. 2). It is clear that, although their numbers have increased in release 7, the distribution of the numbers of exons per transcript suggests that the models themselves are very consistent in structure. Transcripts annotated at protein-coding loci demonstrate a peak at four exons per transcript, while lncRNAs show a very similar pattern given the large increase in their numbers between 3c and 7, with a distinct peak at two exons. This analysis confirms a high degree of homogeneity in the structure of transcripts annotated between releases 3c and 7. While the structure of annotated transcripts is invariant, there is a difference between the annotation of UTRs in those models in releases 3c and 7 (Fig. 6). Both the mean 5′ UTR and 3′ UTR length increase with each release between 3c and 7, with the mean 5′ UTR more than 41 bases longer