Chunk #6 — INTRODUCTION — PANTHER protein library

Source: Large-scale gene function analysis with the PANTHER classification system.
Embedded: yes

Text

The core of the PANTHER system is a collection of phylogenetically-defined protein families and subfamilies generated by computational algorithms, and curated by expert biologist using an extensive software system for associating ontology terms1,3. The current release contains over 640K proteins from 82 genomes, of which 79 are from the Reference Proteome Project (http://www.ebi.ac.uk/reference_proteomes/) (Figure 1). UniProt IDs are used as primary protein identifiers. These proteins are representatives of their respective genes. Therefore, each gene is represented by only one protein. In addition, UniProt IDmapping (http://www.uniprot.org/mapping/) is used to map the primary protein IDs to other IDs from different databases and resources, which expands the capability of PANTHER to support a wider range of ID types (see Supported IDs in Box 1). The proteins are divided into 7729 families, each of which is represented by a phylogenetic tree, an HMM, and a multiple sequence alignment (MSA) (Figure 1). Protein family trees are constructed computationally from sequence data using a phylogenetic tree inference algorithm called GIGA24. Nodes in the tree, corresponding to common ancestors of extant family members, are annotated by expert