Back to Search
Start Over
Finding Outliers in Gaussian Model-based Clustering.
- Source :
-
Journal of Classification . Jul2024, Vol. 41 Issue 2, p313-337. 25p. - Publication Year :
- 2024
-
Abstract
- Clustering, or unsupervised classification, is a task often plagued by outliers. Yet there is a paucity of work on handling outliers in clustering. Outlier identification algorithms tend to fall into three broad categories: outlier inclusion, outlier trimming, and post hoc outlier identification methods, with the former two often requiring pre-specification of the number of outliers. The fact that sample squared Mahalanobis distance is beta-distributed is used to derive an approximate distribution for the log-likelihoods of subset finite Gaussian mixture models. An algorithm is then proposed that removes the least plausible points according to the subset log-likelihoods, which are deemed outliers, until the subset log-likelihoods adhere to the reference distribution. This results in a trimming method, called OCLUST, that inherently estimates the number of outliers. [ABSTRACT FROM AUTHOR]
- Subjects :
- *GAUSSIAN mixture models
Subjects
Details
- Language :
- English
- ISSN :
- 01764268
- Volume :
- 41
- Issue :
- 2
- Database :
- Academic Search Index
- Journal :
- Journal of Classification
- Publication Type :
- Academic Journal
- Accession number :
- 178528930
- Full Text :
- https://doi.org/10.1007/s00357-024-09473-3