paperKB
coga / coga-kb
Help
Sign in

Chunk #2 — INTRODUCTION

Source
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.
Embedded
yes

Text

In recent years advanced sequencing techniques have facilitated a substantial increase in whole genome assembly submissions to the public databases. As a result, the RefSeq project has concordantly expanded the depth and breadth of taxa included in the dataset primarily through improvements to several in-house annotation pipelines. All taxa are in scope for RefSeq inclusion; however, annotation is often limited to those organisms for which a high quality primary genome assembly is available with uncontested organism information. Thus, we may exclude some categories of data that don't meet our quality standards. Excluded datasets include: metagenomes, assemblies with low contig N50 values or especially high number of unplaced scaffolds/contigs (i.e. high fragmentation), or genomes that have significant mismatch or indel variation compared to other closely related genomes for the species (e.g. some prokaryotes).