Back to Search
Start Over
GIST: Greedy Independent Set Thresholding for Diverse Data Summarization
- Publication Year :
- 2024
-
Abstract
- We introduce a novel subset selection problem called min-distance diversification with monotone submodular utility ($\textsf{MDMS}$), which has a wide variety of applications in machine learning, e.g., data sampling and feature selection. Given a set of points in a metric space, the goal of $\textsf{MDMS}$ is to maximize an objective function combining a monotone submodular utility term and a min-distance diversity term between any pair of selected points, subject to a cardinality constraint. We propose the $\texttt{GIST}$ algorithm, which achieves a $\frac{1}{2}$-approximation guarantee for $\textsf{MDMS}$ by approximating a series of maximum independent set problems with a bicriteria greedy algorithm. We also prove that it is NP-hard to approximate to within a factor of $0.5584$. Finally, we demonstrate that $\texttt{GIST}$ outperforms existing benchmarks for on a real-world image classification task that studies single-shot subset selection for ImageNet.<br />Comment: 19 pages, 3 figures
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2405.18754
- Document Type :
- Working Paper