1. Matching Tabular Data to Knowledge Graph with Effective Core Column Set Discovery.
- Author
-
Qiu, Jingyi, Song, Aibo, Jin, Jiahui, Chen, Jiaoyan, Zhang, Xinyu, Fang, Xiaolin, and Zhang, Tianbo
- Subjects
LANGUAGE models ,KNOWLEDGE graphs ,SEMANTICS ,HEURISTIC - Abstract
Matching tabular data to a knowledge graph (KG) is critical for understanding the semantic column types, column relationships, and entities of a table. Existing matching approaches rely heavily on core columns that represent primary subject entities on which other columns in the table depend. However, discovering these core columns before understanding the table's semantics is challenging. Most prior works use heuristic rules, such as the leftmost column, to discover a single core column, while an insightful discovery of the core column set that accurately captures the dependencies between columns is often overlooked. To address these challenges, we introduce Dependency-aware Core Column Set Discovery (DaCo), an iterative method that uses a novel rough matching strategy to identify both inter-column dependencies and the core column set. Additionally, DaCo can be seamlessly integrated with pre-trained language models, as proposed in the optimization module. Unlike other methods, DaCo does not require labeled data or contextual information, making it suitable for real-world scenarios. In addition, it can identify multiple core columns within a table, which is common in real-world tables. We conduct experiments on six datasets, including five datasets with single core columns and one dataset with multiple core columns. Our experimental results show that DaCo outperforms existing core column set detection methods, further improving the effectiveness of table understanding tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF