Back to Search Start Over

Randomized near-neighbor graphs, giant components and applications in data science.

Authors :
Linderman, George C
Linderman, George C
Mishne, Gal
Jaffe, Ariel
Kluger, Yuval
Steinerberger, Stefan
Linderman, George C
Linderman, George C
Mishne, Gal
Jaffe, Ariel
Kluger, Yuval
Steinerberger, Stefan
Source :
Journal of applied probability; vol 57, iss 2, 458-476; 0021-9002
Publication Year :
2020

Abstract

If we pick n random points uniformly in [0, 1] d and connect each point to its c d log n-nearest neighbors, where d ≥ 2 is the dimension and c d is a constant depending on the dimension, then it is well known that the graph is connected with high probability. We prove that it suffices to connect every point to c d,1 log log n points chosen randomly among its c d,2 log n-nearest neighbors to ensure a giant component of size n - o(n) with high probability. This construction yields a much sparser random graph with ~ n log log n instead of ~ n log n edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of connecting each point to its k nearest neighbors, one can often pick k' ≪ k random points out of the k nearest neighbors and only connect to those without sacrificing quality of results. This approach can simplify and accelerate computation; we illustrate this with experimental results in spectral clustering of large-scale datasets.

Details

Database :
OAIster
Journal :
Journal of applied probability; vol 57, iss 2, 458-476; 0021-9002
Notes :
application/pdf, Journal of applied probability vol 57, iss 2, 458-476 0021-9002
Publication Type :
Electronic Resource
Accession number :
edsoai.on1391603108
Document Type :
Electronic Resource