paperKB
coga / coga-kb
Help
Sign in

Chunk #5 — INTRODUCTION — Improved hidden Markov Models and phylogenetic trees, and ortholog identification — Improved multiple sequence alignments and HMMs

Source
PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium.
Embedded
yes

Text

A multiple sequence alignment was constructed for each family using the MAFFT program (6) and a phylogenetic tree was estimated from the protein multiple alignment. Subfamily identifiers from version 6.1 were then ‘forward tracked’ to ancestral nodes in the version 7.0 trees whenever possible. In addition, in many cases, due to improvements in the phylogenetic trees in PANTHER 7 (see below), subfamily boundaries were refined during manual curation. After manual review and correction, if necessary, of the locations of both forward tracked and new subfamilies, a new HMM was constructed for each family and subfamily. We modified our existing HMM construction process (7) to make use of the multiple alignment from MAFFT. For PANTHER 7, we took the relevant sequences in the MAFFT alignment, trimmed it to include as match states only those columns aligned by ≥30% of the sequences in the subalignment [sequences were weighted using the same technique as in (1)], and used it to construct an initial model using the modelfromalign program in SAM3.1. We then used this initial model as input, in addition to the sequences