1. Fast Error-Bounded Distance Distribution Computation
- Author
-
Man Lung Yiu, Jiahao Zhang, Qing Li, and Bo Tang
- Subjects
Work (thermodynamics) ,Computer science ,Computation ,Sampling (statistics) ,02 engineering and technology ,Distance measures ,Computer Science Applications ,Distribution (mathematics) ,Computational Theory and Mathematics ,Orders of magnitude (time) ,020204 information systems ,Bounded function ,0202 electrical engineering, electronic engineering, information engineering ,Cluster analysis ,Algorithm ,Information Systems - Abstract
In this work we study the distance distribution computation problem. It has been widely used in many real-world applications, e.g., human genome clustering, cosmological model analysis, and parameter tuning. The straightforward solution for the exact distance distribution computation problem is unacceptably slow due to (i) massive data size, and (ii) expensive distance computation. In this paper, we propose a novel method to compute approximate distance distributions with error bound guarantees. Furthermore, our method is generic to different distance measures. We conduct extensive experimental studies on three widely used distance measures with real-world datasets. The experimental results demonstrate that our proposed method outperforms sampling-based solution (without error guarantees) by up to three orders of magnitude.
- Published
- 2022