Chunk #3 — INTRODUCTION

Source: Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.
Embedded: yes

Text

A unique aspect of the RefSeq dataset is the combined approach of leveraging computation, collaboration and curation by NCBI scientific staff. As a large bioinformatics facility, NCBI has invested in developing robust process flows to generate annotation and perform quality assurance tests for eukaryotic and prokaryotic genomes, transcripts, and proteins. Improvements to the viral genomes process flow are in progress. The RefSeq group collaborates with numerous expert groups including official nomenclature authorities (e.g. HUGO Gene Nomenclature Committee (HGNC) and Zebrafish Information Network (ZFIN) for human and zebrafish gene names respectively), UniProtKB (protein names) and miRBase (microRNAs) (2–5). These, and other, collaborations help maintain and improve on the quality of the RefSeq data set through QA reports, exchanges of gene and sequence information, and exchanges of functional information. NCBI staff also provide curation support for viruses, prokaryotes, eukaryotes, organelles, plasmids, and targeted projects including curating genes and sequences for Homo sapiens, Mus musculus and other organisms. RefSeq curators improve the quality of the database through review of QA test results, involvement in the selection of certain inputs for genome annotation processing,