Chunk #53 — FUTURE DIRECTIONS

Source: Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.
Embedded: yes

Text

The RefSeq project is unique in offering a reference sequence dataset of transcripts, proteins and genomes that encompasses all kingdoms of life and has been actively maintained and updated over time to incorporate improved computational strategies, new data types, and new knowledge. We have demonstrated the capability and capacity to respond to recent rapid increases in the number of sequenced genomes submitted to INSDC databases. We have defined a diverse set of policies and strategies for the curation and annotation of eukaryotic, prokaryotic, and viral species to meet the different needs of organism-specific communities. The RefSeq dataset is widely used as a reference standard for many different analyses including human and pathogen clinical applications, comparative genomics, expression assays, sequence variation interpretation, and both array and probe construction. At NCBI, the RefSeq dataset is integrated into multiple resources including Assembly, BLAST, Epigenomics, Gene (where RefSeq annotation is the primary basis for most Gene entries), Genome, dbSNP, dbVar, Variation Viewer, and more.