paperKB
coga / coga-kb
Help
Sign in

Chunk #13 — RESULTS — New species and improved gene annotations

Source
Ensembl 2009.
Embedded
yes

Text

The other major focus has been the ongoing improvement of the human geneset in collaboration with other groups. Ensembl, together with the Sanger Institute HAVANA group (19), is part of multiple collaborations to refine the human geneset including the CCDS (Consensus Coding Sequence) consortium, with RefSeq at NCBI (20) and UCSC (1), and the new ENCODE scale-up project GENCODE (http://www.sanger.ac.uk/encode/) with multiple collaborators. CCDS (http://www.ncbi.nlm.nih.gov/CCDS/) is a stable set of protein coding gene structures for which all consortium members agree to the base pair. Since our previous report (14) the human CCDS set has increased from 18 290 to 20 159 CDSs, which represents an increase from 16 003 to 17 052 genes with at least one CCDS entry. There is also a CCDS set for mouse, which has increased even more, from 13 374 to 17 707 CDSs and from 13 014 to 16 889 genes. GENCODE builds on CCDS to validate additional transcripts and extend into UTR regions, building on the ENCODE pilot project (9,21–23) and incorporating additional computational and experimental input and validation (24). One new computational