Back to Search
Start Over
A Constrained Gaussian Mixture Model for Correlation-Based Cluster Analysis of Gene Expression Data
- Source :
- IPSJ Transactions on Bioinformatics. 2:47-62
- Publication Year :
- 2009
- Publisher :
- Information Processing Society of Japan, 2009.
-
Abstract
- Clustering is a practical data analysis step in gene expression-based studies. Model-based clusterings, which are based on probabilistic generative models, have two advantages: the number of clusters can be determined based on statistical criteria, and the clusters are robust against the observation noises in data. Many existing approaches assume multi-variate Gaussian mixtures as generative models, which are analogous to the use of Euclidean or Mahalanobis type distance as the similarity measure. However, these types of similarity measures often fail to detect co-expressed gene groups. We propose a novel probabilistic model for cluster analyses based on the correlation between gene expression patterns. We also propose a “meta” cluster analysis method to eliminate the dependence of the clustering result on initial values of the clustering algorithm. In empirical studies with a time course gene expression dataset of Bacillus subtilis during sporulation, our method acquires more stable and informative results than the ordinary Gaussian mixture model-based clustering, k-means clustering and hierarchical clustering algorithms, which are widely used in this field. In addition, with the meta-cluster analysis, biologically-meaningful expression patterns are extracted from a set of clustering results. The constraints in our model worked more efficiently than those in the previous studies. In our experiment, such constraints contributed to the stability of the clustering results. Moreover, the clustering based on the Bayesian inference was found to be more stable than those by the conventional maximum likelihood estimation.
- Subjects :
- Clustering high-dimensional data
Fuzzy clustering
Computer science
business.industry
Single-linkage clustering
Correlation clustering
Pattern recognition
Biochemistry, Genetics and Molecular Biology (miscellaneous)
Computer Science Applications
Hierarchical clustering
Determining the number of clusters in a data set
Artificial intelligence
Cluster analysis
business
k-medians clustering
Subjects
Details
- ISSN :
- 18826679
- Volume :
- 2
- Database :
- OpenAIRE
- Journal :
- IPSJ Transactions on Bioinformatics
- Accession number :
- edsair.doi...........a4f2a77664b8c445d85d70a9aae39e92
- Full Text :
- https://doi.org/10.2197/ipsjtbio.2.47