Chunk #55 — FUTURE DIRECTIONS

Source: Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.
Embedded: yes

Text

For prokaryotic genomes, we continue to work on refining aspects of the structural annotation that is generated by the Prokaryotic Genome Annotation Pipeline. Our work toward a new approach to manage functional information is still being refined and will be described elsewhere. We anticipate re-annotating the entire RefSeq prokaryotic genomes dataset when new versions of our prokaryotic annotation pipeline become available (to improve structural annotation). The decision to annotate all RefSeq prokaryotes using a single method, together with the sheer volume of this dataset, necessitates a different approach that leverages multiple sources of evidence to provide functional information. Protein names will be updated on an ongoing basis as organized by protein families or categories of evidence type. Our goals for the coming year include greater integration of Rfam (65) in our annotation pipeline, expanded collaboration, improved protein names, and reporting support evidence on the protein sequence record.