paperKB
coga / coga-kb
Help
Sign in

Chunk #3 — INTRODUCTION — Improved hidden Markov Models and phylogenetic trees, and ortholog identification — Gene families covering fully sequenced genomes

Source
PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium.
Embedded
yes

Text

Previous versions of PANTHER focused on identifying subfamilies and the underlying functional divergence events. PANTHER 7 expands upon this focus by supporting accurate ortholog identification, and annotation of gene families ‘at any point in gene family evolution’, not just the major divergences. In order to meet these requirements, we made several important improvements to PANTHER. First, PANTHER trees aim to represent ‘all’ protein-coding genes from a phylogenetically diverse set of organisms. For PANTHER 7 trees, complete protein-coding gene sets for 48 different organisms were carefully constructed from a number of different sources, in collaboration with the GO Consortium, with an effort to use curated sources for model organism genomes (Table 1). These sets can be downloaded at ftp://ftp.pantherdb.org/genome/pthr7.0. We were careful to maintain stable PANTHER family and subfamily accession numbers from the previous version 6.1 to 7.0. To define protein family membership, each PANTHER 7 protein sequence was scored against the HMMs from version 6.1 and assigned to the family with the highest HMM score. If the resulting protein family contained over 1000 sequences, we attempted to manually divide it