paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #36 — ALGAE, FUNGI, NEMATODES AND PROTOZOA

Source
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.
Embedded
yes

Text

The ‘Small Eukaryotes’ designation refers to the pipeline's primary use to generate RefSeq genomes for relatively smaller eukaryotic genomes (compared to those of plants and vertebrates) such as those of algae, protozoa, fungi, nematodes, and some arthropods. However, some large plant genomes are also processed using this pipeline. This pipeline processes high-quality assemblies consisting of chromosomes and/or scaffolds and their components. Those assemblies with high contig and scaffold N50, high quality sequence, and reasonably good INSDC-submitted annotation are prioritized. This pipeline, which replaces a historical process flow that required more manual support, has only recently reached a public production phase and is already yielding an increased number of ‘small’ eukaryotic genomes represented in RefSeq. Work is ongoing to optimize the pipeline throughput and to add more automation and further minimize curator processing tasks. Longer-term plans include implementing a protein-name management system in order to provide, correct, or improve on the INSDC submitted names over time. Many of the genomes that are in scope for the small eukaryotes pipeline cannot currently be processed by the (large) eukaryotic genome annotation pipeline due to taxonomic diversity and limited availability of transcript data needed to train the de novo annotation pipeline.