Chunk #33 — PLANTS

Source: Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.
Embedded: yes

Text

annotation process by a combination of automated processing and manual review. Manual curation of plant transcript and protein data are currently provided for Zea mays and Solanum lycopersicum. The current curation focus entails extensive sequence review and is targeted toward resolving QA concerns in the current set of transcripts. Error resolution is focused on identifying and removing chimeric transcripts, redundant transcripts and genes, and improving the quality of the represented sequence by assessing indels and mismatches among the RefSeq transcript, the genomic sequence, and orthologous data. For plants, we strive to provide a curated transcript and protein dataset that is consistent with the cultivar selected for genome sequencing and assembly. The curation protocol used for vertebrate data is also used for plants. Thus, RefSeq transcript records may be updated to be based on a different INSDC source sequence, or may be assembled from more than one INSDC sequence record in order to provide a transcript from the preferred cultivar. If INSDC transcript data are not available for the genomic cultivar then a RefSeq transcript may be generated from the assembled genomic sequence based on a combination of transcript or protein alignments, RNA-Seq, and/or published data. A second area of focus