Back to Search Start Over

MammoTab: a giant and comprehensive dataset for Semantic Table Interpretation

Authors :
Marzocchi, M
Cremaschi, M
Pozzi, R
Avogadro, R
Palmonari, M
Efthymiou, V
Jiménez-Ruiz, E
Chen, J
Cutrona, V
Hassanzadeh, O
Sequeda, J
Srinivas, K
Abdelmageed, N
Hulsebos, M
Marzocchi, M
Cremaschi, M
Pozzi, R
Avogadro, R
Palmonari, M
Publication Year :
2023
Publisher :
CEUR-WS, 2023.

Abstract

In this paper, we present MammoTab, a dataset composed of 1M Wikipedia tables extracted from over 20M Wikipedia pages and annotated through Wikidata. The lack of this kind of datasets in the state- of-the-art makes MammoTab a good resource for testing and training Semantic Table Interpretation approaches. The dataset has been designed to cover several key challenges, such as disambiguation, homonymy, and NIL-mentions. The dataset has been evaluated using MTab, one of the best approaches of the SemTab challenge.

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.od......1299..b2fb79d82a1cc61922daea3d3c9e1507