paperKB
coga / coga-kb
Help
Sign in

Chunk #13 — ACCESSING THE REFSEQ DATASET

Source
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.
Embedded
yes

Text

RefSeq data can also be downloaded from the genomes FTP site. In August 2014 NCBI announced a major reorganization of this FTP site which now provides assembly and organism-based access to both GenBank and RefSeq genomes (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/). This directory is further divided into subdirectories based on the same groups that are used in the RefSeq release, each of which provides additional sub-divisions by species. The genomes FTP site provides files representing all RefSeq genome assemblies reported in NCBI's Assembly resource (www.ncbi.nlm.nih.gov/assembly/). The advantage of the genomes site is that the data can be accessed in an assembly- or organism-specific manner. Data provided includes genome and product (transcript/protein) sequence, annotation, assembly reports and statistics, and MD5 checksums; these data are updated when the genome assembly and/or annotation are updated. This area does not include RefSeq sequences that are outside the scope of a genome assembly or products that are not annotated on a genome.