1. Quality-based distance measures and applications to clustering
- Author
-
Edward R. Dougherty, Darin Taverna, Yi Chen, and Marcel Brun
- Subjects
Fuzzy clustering ,CURE data clustering algorithm ,Correlation clustering ,Consensus clustering ,Constrained clustering ,Data mining ,Biology ,Cluster analysis ,computer.software_genre ,computer ,k-medians clustering ,Hierarchical clustering - Abstract
When analyzing biological data sets, a common approach is to partition the data into clusters. Examples of this include finding a subset of genes with co-regulated expression among experiments, grouping similar disease phenotypes, or implicating regions of genetic variation in disease. The ability to separate the data into subsets depends upon the structure of the distribution of points and the choice of clustering algorithm. Furthermore, the biological relevance of the clustering results is biased by the variation among the data points themselves. We introduce a mathematical quality-based distance metric which will allow all data, regardless of its error, to be included in analysis without the need to introduce a cutoff. This removes the need to exclude points or to change the dimensionality. The advantage of this approach is shown by clustering simulated data with added noise.
- Published
- 2006
- Full Text
- View/download PDF