Back to Search Start Over

A Variational EM Acceleration for Efficient Clustering at Very Large Scales.

Authors :
Hirschberger, Florian
Forster, Dennis
Lucke, Jorg
Source :
IEEE Transactions on Pattern Analysis & Machine Intelligence. Dec2022, Vol. 44 Issue Part3, p9787-9801. 15p.
Publication Year :
2022

Abstract

How can we efficiently find very large numbers of clusters C in very large datasets N of potentially high dimensionality D? Here we address the question by using a novel variational approach to optimize Gaussian mixture models (GMMs) with diagonal covariance matrices. The variational method approximates expectation maximization (EM) by applying truncated posteriors as variational distributions and partial E-steps in combination with coresets. Run time complexity to optimize the clustering objective then reduces from O(NCD) per conventional EM iteration to O(N′G2D) for a variational EM iteration on coresets (with coreset size N ′ ≤ N and truncation parameter G ≪ C). Based on the strongly reduced run time complexity per iteration, which scales sublinearly with NC, we then provide a concrete, practically applicable, parallelized and highly efficient clustering algorithm. In numerical experiments on standard large-scale benchmarks we (A) show that also overall clustering times scale sublinearly with NC, and (B) observe substantial wall-clock speedups compared to already highly efficient recently reported results. The algorithm’s sublinear scaling allows for applications at scales where alternative methods cease to be applicable. We demonstrate such very large-scale applicability using the YFCC100M benchmark, for which we realize with a GMM of up to 50.000 clusters an optimization of a data density model with up to 150 M parameters. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01628828
Volume :
44
Issue :
Part3
Database :
Academic Search Index
Journal :
IEEE Transactions on Pattern Analysis & Machine Intelligence
Publication Type :
Academic Journal
Accession number :
160711839
Full Text :
https://doi.org/10.1109/TPAMI.2021.3133763