paperKB
coga / coga-kb
Help
Sign in

Chunk #23 — Evolution of the GENCODE gene set

Source
GENCODE: the reference human genome annotation for The ENCODE Project.
Embedded
yes

Text

The patterns of change in the GENCODE loci across releases 3c to 7 are reproduced at the transcript level. It is clear from the data that the vast majority (>75%) of transcripts are associated with protein-coding loci. While the total number of protein-coding loci is decreasing, the number of coding locus transcripts is increasing with each release. The lncRNA transcript numbers show less stability than protein-coding loci and pseudogenes because of the novel status of the whole area of lncRNAs and the method of identification has changed, for example, based on chromatin signatures or position relative to a protein coding gene. A key change in lncRNA transcripts between releases 3c and 7 is the introduction of a more refined set of biotypes for Level 1 and 2 transcripts (see Supplemental Table 4), specifically the number of transcripts with the biotype processed_transcript reduced significantly and the number of antisense, lincRNA, noncoding and sense_intronic biotypes correspondingly increases.