1. Part-Join; partition based string similarity join.
- Author
-
CHEN Yi-cheng, LUO Ji-zhou, and LI Jian-zhong
- Abstract
Recently many efficient similarity join algorithms have been proposed, however, these algorithms use only the local information of the strings and neglect the global information of the data set, so the performance has not been sufficiently improved. This paper proposed Part-Join, which partitioned the data set into subsets with the help of frequency vector, alphabet and frequency distribution, meanwhile, it deviced some prunning strategies to filter out dissimilar string pairs. Experimental results show that the algorithm presented is more efficient than Pass-Join with the efficiency increased by 10% to 15%. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF