Sorry, I don't understand your search. ×
Back to Search Start Over

GLADIS: A General and Large Acronym Disambiguation Benchmark

GLADIS: A General and Large Acronym Disambiguation Benchmark

Authors :
Chen, Lihu
Varoquaux, Gaël
Suchanek, Fabian M.
Institut Polytechnique de Paris (IP Paris)
Département Informatique et Réseaux (INFRES)
Télécom ParisTech
Data, Intelligence and Graphs (DIG)
Laboratoire Traitement et Communication de l'Information (LTCI)
Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris
Modelling brain structure, function and variability based on high-field MRI data (PARIETAL)
Service NEUROSPIN (NEUROSPIN)
Université Paris-Saclay-Direction de Recherche Fondamentale (CEA) (DRF (CEA))
Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Direction de Recherche Fondamentale (CEA) (DRF (CEA))
Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Inria Saclay - Ile de France
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
Laboratoire de Neuroimagerie Assistée par Ordinateur (LNAO)
Commissariat à l'énergie atomique et aux énergies alternatives (CEA)
Méthodes computationnelles et mathématiques pour comprendre la société et la santé à partir de données (SODA)
Inria Saclay - Ile de France
ANR-20-CHIA-0012,NoRDF,Modéliser et extraire des informations complexes du texte en langage naturel(2020)
Source :
EACL 2023-The 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023-The 17th Conference of the European Chapter of the Association for Computational Linguistics, May 2023, Dubrovnik, Croatia
Publication Year :
2023
Publisher :
Zenodo, 2023.

Abstract

Acronym Disambiguation (AD) is crucial for natural language understanding on various sources, including biomedical reports, scientific papers, and search engine queries. However, existing acronym disambiguation benchmarks and tools are limited to specific domains, and the size of prior benchmarks is rather small. To accelerate the research on acronym disambiguation, we construct a new benchmark named GLADIS with three components: (1) a much larger acronym dictionary with 1.5M acronyms and 6.4M long forms; (2) a pre-training corpus with 160 million sentences; (3) three datasets that cover the general, scientific, and biomedical domains. We then pre-train a language model, \emph{AcroBERT}, on our constructed corpus for general acronym disambiguation, and show the challenges and values of our new benchmark.<br />Long paper at EACL 23

Details

Database :
OpenAIRE
Journal :
EACL 2023-The 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023-The 17th Conference of the European Chapter of the Association for Computational Linguistics, May 2023, Dubrovnik, Croatia
Accession number :
edsair.doi.dedup.....35aff525211eb09c4b5974dfa5c79775
Full Text :
https://doi.org/10.5281/zenodo.7562819