Back to Search Start Over

SciND: a new triplet-based dataset for scientific novelty detection via knowledge graphs.

Authors :
Gupta, Komal
Ahmad, Ammaar
Ghosal, Tirthankar
Ekbal, Asif
Source :
International Journal on Digital Libraries. Dec2024, Vol. 25 Issue 4, p639-659. 21p.
Publication Year :
2024

Abstract

Detecting texts that contain semantic-level new information is not straightforward. The problem becomes more challenging for research articles. Over the years, many datasets and techniques have been developed to attempt automatic novelty detection. However, the majority of the existing textual novelty detection investigations are targeted toward general domains like newswire. A comprehensive dataset for scientific novelty detection is not available in the literature. In this paper, we present a new triplet-based corpus (SciND) for scientific novelty detection from research articles via knowledge graphs. The proposed dataset consists of three types of triples (i) triplet for the knowledge graph, (ii) novel triplets, and (iii) non-novel triplets. We build a scientific knowledge graph for research articles using triplets across several natural language processing (NLP) domains and extract novel triplets from the paper published in the year 2021. For the non-novel articles, we use blog post summaries of the research articles. Our knowledge graph is domain-specific. We build the knowledge graph for seven NLP domains. We further use a feature-based novelty detection scheme from the research articles as a baseline. Moreover, we show the applicability of our proposed dataset using our baseline novelty detection algorithm. Our algorithm yields a baseline F1 score of 72%. We show analysis and discuss the future scope using our proposed dataset. To the best of our knowledge, this is the very first dataset for scientific novelty detection via a knowledge graph. We make our codes and dataset publicly available at https://github.com/92Komal/Scientific_Novelty_Detection. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14325012
Volume :
25
Issue :
4
Database :
Academic Search Index
Journal :
International Journal on Digital Libraries
Publication Type :
Academic Journal
Accession number :
180588023
Full Text :
https://doi.org/10.1007/s00799-023-00386-x