Chunk #32 — PLANTS

Source: Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.
Embedded: yes

Text

RefSeq continues to expand the diversity of plant species represented in the dataset. To date, 61 plant species have been included in the RefSeq genomes dataset (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/plant/) of which 33 species were annotated through the eukaryotic genome annotation pipeline; the remainder are RefSeq copies of annotated genomes submitted to INSDC. In the future, more plant genomes selected for RefSeq inclusion will be processed by the eukaryote annotation pipeline, rather than propagating annotation from the INSDC submission. This is a change of policy for the RefSeq plant genomes and will result in greater overall consistency of plant annotation data within the RefSeq dataset. The majority of the RefSeq transcripts and proteins available for plant species are ‘model’ records (XM_, XP_ and XR_ accessions; Table 1), with a smaller subset of ‘known’ records (NM_, NR_, NP_) that are maintained independently of the annotation process by a combination of automated processing and manual review. Manual curation of plant transcript and protein data are currently provided for Zea mays and Solanum lycopersicum. The current curation focus entails extensive sequence review and is targeted toward