Back to Search Start Over

SimCC: A novel method to consider both content and citations for computing similarity of scientific papers

Authors :
Masoud Reyhani Hamedani
Sang-Wook Kim
Dong-Jin Kim
Source :
Information Sciences. :273-292
Publication Year :
2016
Publisher :
Elsevier BV, 2016.

Abstract

To compute the similarity of scientific papers, text-based similarity measures, link-based similarity measures, and hybrid methods can be applied. The text-based and link-based similarity measures take into account only a single aspect of scientific papers, content or citations, respectively. The hybrid methods consider both content and citations; however, they do not carefully consider the relation between the content of a pair of papers involved in a citation relationship. In this paper, we propose a novel method, SimCC (similarity based on content and citations), that considers both aspects, content and citations, to compute the similarity of scientific papers. Unlike previous methods, SimCC effectively reflects both content and authority of scientific papers simultaneously in similarity computation by applying a new RA (relevance and authority) weighting scheme. Also, we propose an RA+R weighting scheme to consider the recency of papers and an RA+E weighting scheme to take into account the author expertise of papers in similarity computation. The effectiveness of our proposed method is demonstrated by extensive experiments on a real-world dataset of scientific papers. The results show that our method achieves more than 100% improvement in accuracy in comparison with previous methods.

Details

ISSN :
00200255
Database :
OpenAIRE
Journal :
Information Sciences
Accession number :
edsair.doi...........3d3c7a71c459da7eac3d4c08582f25fa