Chunk #19 — VERTEBRATES — RefSeqGene project

Source: Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.
Embedded: yes

Text

The RefSeqGene sub-project defines human genomic sequences to be used as reference standards for well-characterized genes, particularly for use by the clinical genetics community. These sequences serve as a stable foundation for reporting pathogenic variants, for establishing conventions for numbering exons and introns, and for defining the coordinates of other variants. Each RefSeqGene record focuses on a gene-specific genomic region and typically is annotated with a subset of RefSeq transcripts and proteins selected by domain experts. Those selections determine exon features. Alignments of older versions of the canonical RefSeq transcript/protein, as well as other known RefSeqs, are included. These records typically include 5 kilobases (kb) of sequence upstream of the focus gene, and 2 kb of sequence downstream, to support representation of potential regulatory sites or deletions extending beyond the gene feature. A RefSeqGene record may include annotation information for other genes that are located within its boundaries. RefSeqGene records are reviewed initially by locus-specific databases and NCBI staff. RefSeqGene is a member of the LRG collaboration (7) which provides additional review of the sequence data before adding an LRG