Back to Search
Start Over
SciND: a new triplet-based dataset for scientific novelty detection via knowledge graphs.
- Source :
-
International Journal on Digital Libraries . Dec2024, Vol. 25 Issue 4, p639-659. 21p. - Publication Year :
- 2024
-
Abstract
- Detecting texts that contain semantic-level new information is not straightforward. The problem becomes more challenging for research articles. Over the years, many datasets and techniques have been developed to attempt automatic novelty detection. However, the majority of the existing textual novelty detection investigations are targeted toward general domains like newswire. A comprehensive dataset for scientific novelty detection is not available in the literature. In this paper, we present a new triplet-based corpus (SciND) for scientific novelty detection from research articles via knowledge graphs. The proposed dataset consists of three types of triples (i) triplet for the knowledge graph, (ii) novel triplets, and (iii) non-novel triplets. We build a scientific knowledge graph for research articles using triplets across several natural language processing (NLP) domains and extract novel triplets from the paper published in the year 2021. For the non-novel articles, we use blog post summaries of the research articles. Our knowledge graph is domain-specific. We build the knowledge graph for seven NLP domains. We further use a feature-based novelty detection scheme from the research articles as a baseline. Moreover, we show the applicability of our proposed dataset using our baseline novelty detection algorithm. Our algorithm yields a baseline F1 score of 72%. We show analysis and discuss the future scope using our proposed dataset. To the best of our knowledge, this is the very first dataset for scientific novelty detection via a knowledge graph. We make our codes and dataset publicly available at https://github.com/92Komal/Scientific_Novelty_Detection. [ABSTRACT FROM AUTHOR]
- Subjects :
- *KNOWLEDGE graphs
*SCIENTIFIC knowledge
*DATA mining
*BLOGS
*ALGORITHMS
Subjects
Details
- Language :
- English
- ISSN :
- 14325012
- Volume :
- 25
- Issue :
- 4
- Database :
- Academic Search Index
- Journal :
- International Journal on Digital Libraries
- Publication Type :
- Academic Journal
- Accession number :
- 180588023
- Full Text :
- https://doi.org/10.1007/s00799-023-00386-x