Chunk #30 — Analyzing the protein-coding complement of the GENCODE 7 reference set

Source: GENCODE: the reference human genome annotation for The ENCODE Project.
Embedded: yes

Text

We analyzed the annotated CDS in GENCODE 7 using the data in the APPRIS database (http://appris.bioinfo.cnio.es/). APPRIS defines the principal variant by combining protein structural, functional, and conservation information from related species in order to determine the proportion of transcripts that would generate functional isoforms with changes to their protein-coding features relative to the constitutional variant. Of the 84,408 transcripts annotated as translated in the GENCODE 7 release, 30,148 (35.7% of all transcripts or 47.3% of alternative transcripts) would generate protein isoforms either with fewer Pfam functional domains (Finn et al. 2010) or with damaged Pfam domains with respect to the constitutional variant for the same gene. Twenty-six thousand nine hundred fifty-five isoforms (31.9% of all isoforms or 42.3% of alternative isoforms) would have lost or damaged structural domains, based on alignments with Protein Data Bank (PDB) structures, and 16,540 isoforms (19.6% of all isoforms or 26% of alternative isoforms) would lose functionally important residues.