Back to Search
Start Over
MammoTab: a giant and comprehensive dataset for Semantic Table Interpretation
- Publication Year :
- 2023
- Publisher :
- CEUR-WS, 2023.
-
Abstract
- In this paper, we present MammoTab, a dataset composed of 1M Wikipedia tables extracted from over 20M Wikipedia pages and annotated through Wikidata. The lack of this kind of datasets in the state- of-the-art makes MammoTab a good resource for testing and training Semantic Table Interpretation approaches. The dataset has been designed to cover several key challenges, such as disambiguation, homonymy, and NIL-mentions. The dataset has been evaluated using MTab, one of the best approaches of the SemTab challenge.
- Subjects :
- Semantic Table Interpretation, Tabular Data, SemTab Challenge, Knowledge Graph
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Accession number :
- edsair.od......1299..b2fb79d82a1cc61922daea3d3c9e1507