Chunk #4 — GENCODE gene merge process

Source: GENCODE: the reference human genome annotation for The ENCODE Project.
Embedded: yes

Text

The genes in the GENCODE reference gene set are classified into three levels according to their type of annotation. Level 1 highlights transcripts that have been manually annotated and experimentally validated by RT-PCR-seq (Howald et al. 2012), as well as pseudogenes that have been validated by three-way consensus, namely, that have been independently validated by three different strategies. Level 2 indicates transcripts that have been manually annotated. Some Level 2 transcripts have been merged with models produced by the Ensembl automatic pipeline, while other Level 2 transcripts are annotated by HAVANA only. Level 3 indicates transcripts and pseudogene predictions arising from Ensembl's automated annotation pipeline. GENCODE 7 consists of 9019 transcripts at Level 1, 118,657 transcripts at Level 2, and 33,699 transcripts at Level 3. Many of the protein-coding genes in Level 3 are contributed by Ensembl's genome-wide annotation in regions where HAVANA has not yet provided manual annotation.