Chunk #13 — PROGRESS REPORT — UniProtKB annotation — Revisiting the human proteome

Source: The Universal Protein Resource (UniProt) in 2010.
Embedded: yes

Text

As part of the review process, we are using information extraction tools such as the STRING database (12) to identify UniProt entries which are candidates for re-annotation. STRING is a meta-database that integrates and assigns reliability scores to information on functional protein interactions and as such provides a useful first pass filter for re-annotation prioritization. Propagation of the annotation from well-characterized orthologs in closely related species (e.g. Mus musculus) to an uncharacterized human protein is another approach used. Sequence update and review includes the merging of previously undescribed splice isoforms and polymorphisms and the correction or removal of erroneous sequences by comparison to the reference human genome. We also continue to create records for newly discovered protein sequences and to delete spurious records which may correspond to pseudogenes or cloning artifacts. UniProt recently joined the Consensus CDS (CCDS) project (13), a collaborative effort to identify a core set of consistently annotated and high quality human and mouse protein-coding regions. The long-term goal is to support convergence towards a standard set of gene and protein annotations. To date, UniProt has investigated