GLADIS: A General and Large Acronym Disambiguation Benchmark

Authors :: Chen, Lihu
Varoquaux, Gaël
Suchanek, Fabian M.
Institut Polytechnique de Paris (IP Paris)
Département Informatique et Réseaux (INFRES)
Télécom ParisTech
Data, Intelligence and Graphs (DIG)
Laboratoire Traitement et Communication de l'Information (LTCI)
Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris
Modelling brain structure, function and variability based on high-field MRI data (PARIETAL)
Service NEUROSPIN (NEUROSPIN)
Université Paris-Saclay-Direction de Recherche Fondamentale (CEA) (DRF (CEA))
Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Direction de Recherche Fondamentale (CEA) (DRF (CEA))
Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Inria Saclay - Ile de France
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
Laboratoire de Neuroimagerie Assistée par Ordinateur (LNAO)
Commissariat à l'énergie atomique et aux énergies alternatives (CEA)
Méthodes computationnelles et mathématiques pour comprendre la société et la santé à partir de données (SODA)
Inria Saclay - Ile de France
ANR-20-CHIA-0012,NoRDF,Modéliser et extraire des informations complexes du texte en langage naturel(2020)
Source :: EACL 2023-The 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023-The 17th Conference of the European Chapter of the Association for Computational Linguistics, May 2023, Dubrovnik, Croatia
Publication Year :: 2023
Publisher :: Zenodo, 2023.
Abstract: Acronym Disambiguation (AD) is crucial for natural language understanding on various sources, including biomedical reports, scientific papers, and search engine queries. However, existing acronym disambiguation benchmarks and tools are limited to specific domains, and the size of prior benchmarks is rather small. To accelerate the research on acronym disambiguation, we construct a new benchmark named GLADIS with three components: (1) a much larger acronym dictionary with 1.5M acronyms and 6.4M long forms; (2) a pre-training corpus with 160 million sentences; (3) three datasets that cover the general, scientific, and biomedical domains. We then pre-train a language model, \emph{AcroBERT}, on our constructed corpus for general acronym disambiguation, and show the challenges and values of our new benchmark.<br />Long paper at EACL 23

Subjects :: FOS: Computer and information sciences
Computer Science - Computation and Language
Acronym Disambiguation
[INFO]Computer Science [cs]
Entity Linking
Benchmark
Computation and Language (cs.CL)
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]

Details

Database :: OpenAIRE
Journal :: EACL 2023-The 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023-The 17th Conference of the European Chapter of the Association for Computational Linguistics, May 2023, Dubrovnik, Croatia
Accession number :: edsair.doi.dedup.....35aff525211eb09c4b5974dfa5c79775
Full Text :: https://doi.org/10.5281/zenodo.7562819

Full Text Access

View/download PDF

Tools

Bookmark
Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

GLADIS: A General and Large Acronym Disambiguation Benchmark

GLADIS: A General and Large Acronym Disambiguation Benchmark

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

GLADIS: A General and Large Acronym Disambiguation Benchmark

GLADIS: A General and Large Acronym Disambiguation Benchmark

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources