Back to Search
Start Over
SimCC: A novel method to consider both content and citations for computing similarity of scientific papers
- Source :
- Information Sciences. :273-292
- Publication Year :
- 2016
- Publisher :
- Elsevier BV, 2016.
-
Abstract
- To compute the similarity of scientific papers, text-based similarity measures, link-based similarity measures, and hybrid methods can be applied. The text-based and link-based similarity measures take into account only a single aspect of scientific papers, content or citations, respectively. The hybrid methods consider both content and citations; however, they do not carefully consider the relation between the content of a pair of papers involved in a citation relationship. In this paper, we propose a novel method, SimCC (similarity based on content and citations), that considers both aspects, content and citations, to compute the similarity of scientific papers. Unlike previous methods, SimCC effectively reflects both content and authority of scientific papers simultaneously in similarity computation by applying a new RA (relevance and authority) weighting scheme. Also, we propose an RA+R weighting scheme to consider the recency of papers and an RA+E weighting scheme to take into account the author expertise of papers in similarity computation. The effectiveness of our proposed method is demonstrated by extensive experiments on a real-world dataset of scientific papers. The results show that our method achieves more than 100% improvement in accuracy in comparison with previous methods.
- Subjects :
- Scheme (programming language)
Information Systems and Management
Information retrieval
Relation (database)
Computer science
05 social sciences
050905 science studies
Computer Science Applications
Theoretical Computer Science
Weighting
Similarity (network science)
Artificial Intelligence
Control and Systems Engineering
Content (measure theory)
Relevance (information retrieval)
0509 other social sciences
050904 information & library sciences
Citation
computer
Software
computer.programming_language
Subjects
Details
- ISSN :
- 00200255
- Database :
- OpenAIRE
- Journal :
- Information Sciences
- Accession number :
- edsair.doi...........3d3c7a71c459da7eac3d4c08582f25fa