paperKB
coga / coga-kb
Help
Sign in

Chunk #12 — ACCESSING THE REFSEQ DATASET

Source
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.
Embedded
yes

Text

RefSeq data are distributed via FTP through two sites, refseq (ftp://ftp.ncbi.nlm.nih.gov/refseq/) and genomes (ftp://ftp.ncbi.nlm.nih.gov/genomes/). The refseq FTP site provides daily updates of all new and updated RefSeq records, weekly updates of some data types, and a bi-monthly comprehensive RefSeq release (/refseq/release/). In addition, select organism-specific transcript and protein datasets, including human and mouse, are updated weekly. The RefSeqGene subdirectory is updated daily, with alignments to the genome released with each annotation run. The comprehensive bi-monthly RefSeq release is organized by taxonomic (e.g. vertebrate mammals) or other groupings (e.g. mitochondria). Data may also be downloaded for the entire RefSeq collection from the /refseq/release/complete/ directory. The RefSeq release offers an advantage for those who want to maintain periodic updates of either the complete collection or a single group. It also includes records that are not available from the companion genomes FTP site, such as transcripts in the collection that are maintained independently from, and may not be currently annotated on, a genome assembly. The release is provided with significant documentation of the files installed (/refseq/release/release-catalog/) including MD5 checksums, a list of all installed files, as well as release notes and announcements (/refseq/release/release-notes/).