Back to Search Start Over

A co‐training ‐based approach for the hierarchical multi‐label classification of research papers

Authors :
Khalil Drira
Hatem Bellaaj
Abir Masmoudi
Mohamed Jmaiel
Université de Sfax - University of Sfax
Équipe Services et Architectures pour Réseaux Avancés (LAAS-SARA)
Laboratoire d'analyse et d'architecture des systèmes (LAAS)
Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse 1 Capitole (UT1)
Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Université Toulouse III - Paul Sabatier (UT3)
Université Fédérale Toulouse Midi-Pyrénées-Institut National des Sciences Appliquées - Toulouse (INSA Toulouse)
Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National Polytechnique (Toulouse) (Toulouse INP)
Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse 1 Capitole (UT1)
Université Fédérale Toulouse Midi-Pyrénées
Unité de Recherche en développement et contrôle d'applications distribuées (REDCAD)
École Nationale d'Ingénieurs de Sfax | National School of Engineers of Sfax (ENIS)
Université Toulouse Capitole (UT Capitole)
Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National des Sciences Appliquées - Toulouse (INSA Toulouse)
Institut National des Sciences Appliquées (INSA)-Université de Toulouse (UT)-Institut National des Sciences Appliquées (INSA)-Université Toulouse - Jean Jaurès (UT2J)
Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3)
Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP)
Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole)
Université de Toulouse (UT)
Source :
Expert Systems, Expert Systems, Wiley, 2021, 38 (4), pp.e12613. ⟨10.1111/exsy.12613⟩, Expert Systems, 2021, 38 (4), pp.e12613. ⟨10.1111/exsy.12613⟩
Publication Year :
2021
Publisher :
HAL CCSD, 2021.

Abstract

International audience; This paper focuses on the problem of the hierarchical multi‐label classification of research papers, which is the task of assigning the set of relevant labels for a paper from a hierarchy, using reduced amounts of labelled training data. Specifically, we study leveraging unlabelled data, which are usually plentiful and easy to collect, in addition to the few available labelled ones in a semi‐supervised learning framework for achieving better performance results. Thus, in this paper, we propose a semi‐supervised approach for the hierarchical multi‐label classification task of research papers based on the well‐known Co‐training algorithm, which exploit content and bibliographic coupling information as two distinct papers' views. In our approach, two hierarchical multi‐label classifiers, are learnt on different views of the labelled data, and iteratively select their most confident unlabelled samples, which are further added to the labelled set. The success of our suggested Co‐training‐based approach lies in two main components. The first is the use of two suggested selection criteria (i.e., Maximum Agreement and Labels Cardinality Consistency) that enforce selecting confident unlabelled samples. The second is the appliance of an oversampling method that rebalances the labels distribution of the initial labelled set, which reduces the reinforcement of the label imbalance issue during the Co‐training learning. The proposed approach is evaluated using a collection of scientific papers extracted from the ACM digital library. Performed experiments show the effectiveness of our approach with regards to several baseline methods.

Details

Language :
English
ISSN :
02664720 and 14680394
Database :
OpenAIRE
Journal :
Expert Systems, Expert Systems, Wiley, 2021, 38 (4), pp.e12613. ⟨10.1111/exsy.12613⟩, Expert Systems, 2021, 38 (4), pp.e12613. ⟨10.1111/exsy.12613⟩
Accession number :
edsair.doi.dedup.....c14c75581a1696cca71f96c09d455e4c