Back to Search Start Over

Compositional Adjustment of Dirichlet Mixture Priors.

Authors :
Ye, Xugang
Yu, Yi-Kuo
Altschul, Stephen F.
Source :
Journal of Computational Biology. Dec2010, Vol. 17 Issue 12, p1607-1620. 14p.
Publication Year :
2010

Abstract

Dirichlet mixture priors provide a Bayesian formalism for scoring alignments of protein profiles to individual sequences, which can be generalized to constructing scores for multiple-alignment columns. A Dirichlet mixture is a probability distribution over multinomial space, each of whose components can be thought of as modeling a type of protein position. Applied to the simplest case of pairwise sequence alignment, a Dirichlet mixture is equivalent to an implied symmetric substitution matrix. For alphabets of even size L, Dirichlet mixtures with L/2 components and symmetric substitution matrices have an identical number of free parameters. Although this suggests the possibility of a one-to-one mapping between the two formalisms, we show that there are some symmetric matrices no Dirichlet mixture can imply, and others implied by many distinct Dirichlet mixtures. Dirichlet mixtures are derived empirically from curated sets of multiple alignments. They imply 'background' amino acid frequencies characteristic of these sets, and should thus be non-optimal for comparing proteins with non-standard composition. Given a mixture ?, we seek an adjusted ?? that implies the desired composition, but that minimizes an appropriate relative-entropy-based distance function. To render the problem tractable, we fix the mixture parameter as well as the sum of the Dirichlet parameters for each component, allowing only its center of mass to vary. This linearizes the constraints on the remaining parameters. An approach to finding ?? may be based on small consecutive parameter adjustments. The relative entropy of two Dirichlet distributions separated by a small change in their parameter values implies a quadratic cost function for such changes. For a small change in implied background frequencies, this function can be minimized using the Lagrange-Newton method. We have implemented this method, and can compositionally adjust to good precision a 20-component Dirichlet mixture prior for proteins in under half a second on a standard workstation. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10665277
Volume :
17
Issue :
12
Database :
Academic Search Index
Journal :
Journal of Computational Biology
Publication Type :
Academic Journal
Accession number :
91277230
Full Text :
https://doi.org/10.1089/cmb.2010.0117