1. Generic features selection for structure classification of diverse styled scholarly articles.
- Author
-
Waqas, Muhammad and Anjum, Nadeem
- Abstract
The enormous growth in online research publications in diversified domains has attracted the research community to extract these valuable scientific resources by searching online digital libraries and publishers' websites. A precise search is desired to enlist most related articles by applying semantic queries to the document's metadata and the structural elements. The online search engines and digital libraries offer only keyword-based search on full-body text, which creates excessive results. Therefore, the research article's structural and metadata information has to be stored in machine comprehendible form by the online research publishers. The research community in recent years has adopted different approaches to extract structural information from research documents like rule-based heuristics and machine-learning-based approaches. Studies suggest that machine-learning-based techniques have produced optimum results for document structure extraction from publishers having diversified publication layouts. In this paper, we have proposed thirteen different logical layout structural (LLS) components. We have identified a two-staged innovative set of generic features that are associated with the LLS. This approach has given our technique an advantage against the state-of-the-art for structural classification of digital scientific articles with diversified publication styles. We have applied chi-square ( c h i 2 ) for feature selection, and the final result has revealed that SVM (Kernal function) has produced an optimum result with an overall F-measure of 0.95. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF