paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #1 — Introduction

Source
Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models.
Embedded
yes

Text

The majority of computational prediction methods utilize evolutionary sequence conservation and/or structural annotations within homologous (orthologous and/or paralogous) proteins from a database of known sequences and/or structures [Ng and Henikoff, 2006]. Traditionally, the BLAST range of pairwise alignment [Altschul et al., 1990] and sequence profile algorithms [Altschul et al., 1997] have been used to search large sequence databases for homologous proteins falling within a predefined similarity threshold. However, weaknesses of these algorithms include the position-invariant scoring matrices in BLAST and the ad hoc estimation of algorithm parameters, that is, position-invariant gap penalties, in PSI-BLAST [Bateman and Haft, 2002]. On the other hand, hidden Markov models (HMMs) [Eddy, 1996; Krogh et al., 1994] are powerful probabilistic models that can be used to capture position-specific information within a multiple sequence alignment (MSA) of homologous sequences. Here, an MSA is represented as a series of match, insert, and delete states linked together via state transitions. A match state models the position-specific amino acid probabilities (with Dirichlet mixtures [Sjölander et al., 1996]) at each column within the sequence alignment whereas insert/delete states allow for