Back to Search Start Over

Exploration on efficient similar sentences extraction.

Authors :
Gu, Yanhui
Yang, Zhenglu
Xu, Guandong
Nakano, Miyuki
Toyoda, Masashi
Kitsuregawa, Masaru
Source :
World Wide Web. Jul2014, Vol. 17 Issue 4, p595-626. 32p.
Publication Year :
2014

Abstract

Measuring the semantic similarity between sentences is an essential issue for many applications, such as text summarization, Web page retrieval, question-answer model, image extraction, and so forth. A few studies have explored on this issue by several techniques, e.g., knowledge-based strategies, corpus-based strategies, hybrid strategies, etc. Most of these studies focus on how to improve the effectiveness of the problem. In this paper, we address the efficiency issue, i.e., for a given sentence collection, how to efficiently discover the top- k semantic similar sentences to a query. The previous methods cannot handle the big data efficiently, i.e., applying such strategies directly is time consuming because every candidate sentence needs to be tested. In this paper, we propose efficient strategies to tackle such problem based on a general framework. The basic idea is that for each similarity, we build a corresponding index in the preprocessing. Traversing these indices in the querying process can avoid to test many candidates, so as to improve the efficiency. Moreover, an optimal aggregation algorithm is introduced to assemble these similarities. Our framework is general enough that many similarity metrics can be incorporated, as will be discussed in the paper. We conduct extensive experimental evaluation on three real datasets to evaluate the efficiency of our proposal. In addition, we illustrate the trade-off between the effectiveness and efficiency. The experimental results demonstrate that the performance of our proposal outperforms the state-of-the-art techniques on efficiency while keeping the same high precision as them. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
1386145X
Volume :
17
Issue :
4
Database :
Academic Search Index
Journal :
World Wide Web
Publication Type :
Academic Journal
Accession number :
96330334
Full Text :
https://doi.org/10.1007/s11280-012-0195-z