paperKB
coga / coga-kb
Help
Sign in

Chunk #4 — GENCODE gene merge process

Source
GENCODE: the reference human genome annotation for The ENCODE Project.
Embedded
yes

Text

The genes in the GENCODE reference gene set are classified into three levels according to their type of annotation. Level 1 highlights transcripts that have been manually annotated and experimentally validated by RT-PCR-seq (Howald et al. 2012), as well as pseudogenes that have been validated by three-way consensus, namely, that have been independently validated by three different strategies. Level 2 indicates transcripts that have been manually annotated. Some Level 2 transcripts have been merged with models produced by the Ensembl automatic pipeline, while other Level 2 transcripts are annotated by HAVANA only. Level 3 indicates transcripts and pseudogene predictions arising from Ensembl's automated annotation pipeline. GENCODE 7 consists of 9019 transcripts at Level 1, 118,657 transcripts at Level 2, and 33,699 transcripts at Level 3. Many of the protein-coding genes in Level 3 are contributed by Ensembl's genome-wide annotation in regions where HAVANA has not yet provided manual annotation.