Chunk #0 — INTRODUCTION

Source: Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.
Embedded: yes

Text

For the past 15 years the National Center for Biotechnology Information (NCBI) RefSeq database has served as an essential resource for genomic, genetic and proteomic research. The RefSeq project's provision of curated and stable annotated reference genomes, transcripts, and proteins for selected viruses, microbes, organelles, and eukaryotic organisms, has allowed researchers to focus on the best representative sequence data in contrast to the redundant data in GenBank, and to unambiguously reference specific genetic sequences. The RefSeq collection provides explicitly linked genome, transcript, and protein sequence records that incorporate publications, informative nomenclature, and standardized and expanded feature annotations. RefSeq records are integrated into NCBI's resources including the Nucleotide, Protein, and BLAST databases and can be easily identified by the keyword ‘RefSeq’ and by their distinct accession prefixes that define their type (Table 1). All RefSeq data are subject to quality assurance (QA) checks with some specialized QA tests developed for different taxa or data types. For example, all viral RefSeqs undergo taxonomic review by NCBI staff before public release. RefSeq accessions are widely cited in scientific publications and genetic databases because