Chunk #41 — PROKARYOTES

Source: Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.
Embedded: yes

Text

With the increasing interest in human pathogens and advancement of DNA sequencing technology, the number of sequenced prokaryotic genomes has rapidly increased in the last decade. Some bacterial strains are often indistinguishable using current genotyping approaches, but minor genetic differences can be detected on the basis of whole-genome sequencing, which is useful for characterizing transmission pathways, identifying antibiotic resistance, and surveying outbreaks. To investigate food-borne pathogens or infection outbreaks, large numbers of nearly identical bacterial genomes have been sequenced and annotated in recent years, resulting in numerous identical proteins, each having a distinct accession number. In 2013 NCBI introduced a new protein data model and accession prefix (WP_) for the RefSeq collection. This change reduced the redundancy in RefSeq prokaryotic proteins and facilitated identification of proteins that were identically found on more than one genome. It also allowed for an improved strategy for managing prokaryotic protein names. These non-redundant records represent unique prokaryotic protein sequences that are independent of any particular bacterial genome and may be annotated on multiple strains or species (www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/).