Back to Search Start Over

Efficient distributed algorithms for distance join queries in spark-based spatial analytics systems.

Authors :
García-García, Francisco
Corral, Antonio
Iribarne, Luis
Vassilakopoulos, Michael
Source :
International Journal of General Systems. Apr2023, Vol. 52 Issue 3, p206-250. 45p.
Publication Year :
2023

Abstract

Apache Sedona (formerly GeoSpark) is a new in-memory cluster computing system for processing large-scale spatial data, which extends the core of Apache Spark to support spatial datatypes, partitioning techniques, spatial indexes, and spatial operations (e.g. spatial range, nearest neighbor, and spatial join queries). Distance-based Join Queries (DJQs), like nearest neighbor join (kNNJQ) or closest pairs queries (kCPQ), are not supported by it. Therefore, in this paper, we investigate how to design and implement efficient DJQ distributed algorithms in Apache Sedona, using the most appropriate spatial partitioning and other optimization techniques. The results of an extensive set of experiments with real-world datasets are presented, demonstrating that the proposed kNNJQ and kCPQ distributed algorithms are efficient, scalable, and robust in Apache Sedona. Finally, Sedona is also compared to other similar cluster computing systems, showing the best performance for kCPQ and competitive results for kNNJQ. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
03081079
Volume :
52
Issue :
3
Database :
Academic Search Index
Journal :
International Journal of General Systems
Publication Type :
Academic Journal
Accession number :
163552880
Full Text :
https://doi.org/10.1080/03081079.2023.2173750