1. Multi-label classification of legislative contents with hierarchical label attention networks
- Author
-
Mário J. Silva, Miguel Won, Danielle Caled, and Bruno Martins
- Subjects
Structure (mathematical logic) ,Multi-label classification ,Hierarchy ,Thesaurus (information retrieval) ,Computer science ,business.industry ,Deep learning ,Library and Information Sciences ,computer.software_genre ,Index (publishing) ,media_common.cataloged_instance ,Artificial intelligence ,European union ,business ,computer ,Classifier (UML) ,Natural language processing ,media_common - Abstract
EuroVoc is a thesaurus maintained by the European Union Publication Office, used to describe and index legislative documents. The EuroVoc concepts are organized following a hierarchical structure, with 21 domains, 127 micro-thesauri terms, and more than 6,700 detailed descriptors. The large number of concepts in the EuroVoc thesaurus makes the manual classification of legal documents highly costly. In order to facilitate this classification work, we present two main contributions. The first one is the development of a hierarchical deep learning model to address the classification of legal documents according to the EuroVoc thesaurus. Instead of training a classifier for each hierarchy level, our model allows the simultaneous prediction of the three levels of the EuroVoc thesaurus. Our second contribution concerns the proposal of a new legal corpus for evaluating the classification of documents written in Portuguese. This corpus, named EUR-Lex PT, contains more than 220k documents, labeled under the three EuroVoc hierarchical levels. Comparative experiments with other state-of-the-art models indicate that our approach has competitive results, at the same time offering the ability to interpret predictions through attention weights.
- Published
- 2021
- Full Text
- View/download PDF