Back to Search Start Over

Using Evidence of Mixed Populations to Select Variables for Clustering Very High-Dimensional Data.

Authors :
Yao-ban Chan
Hall, Peter
Source :
Journal of the American Statistical Association; Jun2010, Vol. 105 Issue 490, p798-809, 12p
Publication Year :
2010

Abstract

In this paper we develop a nonparametric approach to clustering very high-dimensional data, designed particularly for problems where the mixture nature of a population is expressed through multimodality of its density. Therefore, a technique based implicitly on mode testing can be particularly effective. In principle, several alternative approaches could be used to assess the extent of multimodality, but in the present problem the excess mass method has important advantages. We show that the resulting methodology for determining clusters is particularly effective in cases where the data are relatively heavy tailed or show a moderate to high degree of correlation, or when the number of important components is relatively small. Conversely, in the case of light-tailed, almost-independent components when there are many clusters, clustering in terms of modality can be less reliable than more conventional approaches. This article has supplementary material online. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01621459
Volume :
105
Issue :
490
Database :
Complementary Index
Journal :
Journal of the American Statistical Association
Publication Type :
Academic Journal
Accession number :
51980096
Full Text :
https://doi.org/10.1198/jasa.2010.tm09404