paperKB
coga / coga-kb
Help
Sign in

Chunk #57 — Methods — Comparison of RefSeq, UCSC, AceView, and GENCODE transcripts

Source
GENCODE: the reference human genome annotation for The ENCODE Project.
Embedded
yes

Text

Transcripts belonging to four different data sets (GENCODE, RefSeq, UCSC, and AceView) were compared to assess to which extent these data sets overlap. Releases compared were GENCODE 7, RefSeq and UCSC Genes freeze July 2011, and AceView 2010 release. First, the exon coordinates of all protein-coding and lncRNA transcripts, respectively, were compared among different data sets. If transcripts were multi-exonic, the transcript boundaries were ignored, thus allowing for some flexibility in the annotations of their 5′ and 3′ ends. Same exon coordinates implied that a transcript was shared between two data sets. Second, the CDS coordinates of protein-coding transcripts, including the intervening exon junctions, were also compared, and an exact match was required to consider that a CDS was shared between two data sets. The overlaps between different data set combinations were graphically represented as three-way Venn diagrams using the Vennerable R package (https://r-forge.r-project.org/projects/vennerable/) and edited manually.