Back to Search Start Over

A Constrained Gaussian Mixture Model for Correlation-Based Cluster Analysis of Gene Expression Data

Authors :
Naoto Yukinawa
Kazuo Kobayashi
Shin Ishii
Naotake Ogasawara
Taku Yoshioka
Source :
IPSJ Transactions on Bioinformatics. 2:47-62
Publication Year :
2009
Publisher :
Information Processing Society of Japan, 2009.

Abstract

Clustering is a practical data analysis step in gene expression-based studies. Model-based clusterings, which are based on probabilistic generative models, have two advantages: the number of clusters can be determined based on statistical criteria, and the clusters are robust against the observation noises in data. Many existing approaches assume multi-variate Gaussian mixtures as generative models, which are analogous to the use of Euclidean or Mahalanobis type distance as the similarity measure. However, these types of similarity measures often fail to detect co-expressed gene groups. We propose a novel probabilistic model for cluster analyses based on the correlation between gene expression patterns. We also propose a “meta” cluster analysis method to eliminate the dependence of the clustering result on initial values of the clustering algorithm. In empirical studies with a time course gene expression dataset of Bacillus subtilis during sporulation, our method acquires more stable and informative results than the ordinary Gaussian mixture model-based clustering, k-means clustering and hierarchical clustering algorithms, which are widely used in this field. In addition, with the meta-cluster analysis, biologically-meaningful expression patterns are extracted from a set of clustering results. The constraints in our model worked more efficiently than those in the previous studies. In our experiment, such constraints contributed to the stability of the clustering results. Moreover, the clustering based on the Bayesian inference was found to be more stable than those by the conventional maximum likelihood estimation.

Details

ISSN :
18826679
Volume :
2
Database :
OpenAIRE
Journal :
IPSJ Transactions on Bioinformatics
Accession number :
edsair.doi...........a4f2a77664b8c445d85d70a9aae39e92
Full Text :
https://doi.org/10.2197/ipsjtbio.2.47