Back to Search Start Over

Efficient Similarity Join Based on Earth Mover’s Distance Using MapReduce.

Authors :
Xu, Jia
Lei, Bin
Gu, Yu
Winslett, Marianne
Yu, Ge
Zhang, Zhenjie
Source :
IEEE Transactions on Knowledge & Data Engineering. Aug2015, Vol. 27 Issue 8, p2148-2162. 15p.
Publication Year :
2015

Abstract

Earth Mover’s Distance (EMD) evaluates the similarity between probability distributions, known as a robust measure more consistent with human similarity perception than traditional similarity functions. EMD-based similarity join retrieves pairs of probability distributions with EMD below a specified threshold, supporting many important applications, such as duplicate image retrieval and sensor pattern recognition. This paper studies the possibility of using MapReduce to improve the scalability of EMD similarity join. While existing MapReduce optimization techniques mainly aim to minimize the communication overhead, such methods are not applicable to our problem, due to the high computational cost of EMD. Utilizing the dual-program mapping technique, we present a new general data partition framework to facilitate effective workload decomposition using MapReduce, ensuring similar distributions in terms of EMD are mapped to the same reduce task for further verification. New optimization strategies are also proposed to balance the workloads among reduce tasks and eliminate large unnecessary EMD evaluations. Our experiments verify the superiority of our proposal on system efficiency, with a huge advantage of at least one order of magnitude than the state-of-the-art solution, and on system effectiveness, with a real case study towards the abused image phenomenon on the most popular C2C website in China. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10414347
Volume :
27
Issue :
8
Database :
Academic Search Index
Journal :
IEEE Transactions on Knowledge & Data Engineering
Publication Type :
Academic Journal
Accession number :
103677751
Full Text :
https://doi.org/10.1109/TKDE.2015.2411281