Back to Search Start Over

Which is better? Taxonomy induction with learning the optimal structure via contrastive learning.

Authors :
Meng, Yuan
Zhai, Songlin
Chai, Zhihua
Zhang, Yuxin
Wu, Tianxing
Qi, Guilin
Song, Wei
Source :
Knowledge-Based Systems. Nov2024, Vol. 304, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

A taxonomy represents a hierarchically structured knowledge graph that forms the infrastructure for various downstream applications, including recommender systems, web search, and question answering. The exploration of automated induction from text corpora has yielded notable taxonomies such as CN-probase, CN-DBpedia, and Zhishi.schema. Despite these efforts, existing taxonomies still face two critical issues that result in sub-optimal hierarchical structures. On the one hand, commonly observed taxonomies exhibit a coarse-grained and "flat" structure, stemming from a noticeable lack of diversity in both nodes and edges. This limitation primarily originates from the biased and homogeneous data distribution. On the other hand, the semantic granularity among "siblings" within these taxonomies remains inconsistent, presenting a challenge in accurately and comprehensively identifying hierarchical relations. To address these issues, this study introduces a novel taxonomy induction framework composed of three meticulously designed components. Initially, we established a seed schema by leveraging statistical information from external data sources as distant supervision to append nodes and edges containing "generic semantics", thereby rectifying biased data distributions. Subsequently, a clustering algorithm is employed to group the nodes based on their similarities, followed by a refinement operation of the hierarchical relations among these nodes. Building on this seed schema, we propose a fine-grained contrastive learning method in the expansion module to strengthen the utilization of taxonomic structures, consequently boosting the precision of query-anchor matching. Finally, we meticulously scrutinized the hierarchical relations between each query and its siblings to ensure the integrity of the constructed taxonomy. Extensive experiments on real-world datasets validated the efficacy of our proposed framework for constructing well-structured taxonomies. • We propose a framework for producing well-structured and comprehensive taxonomies. • Distant supervision enhances seed taxonomy diversity, addressing data skewness. • Contrastive learning refines query-anchor matching and ensures semantic consistency. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09507051
Volume :
304
Database :
Academic Search Index
Journal :
Knowledge-Based Systems
Publication Type :
Academic Journal
Accession number :
180797858
Full Text :
https://doi.org/10.1016/j.knosys.2024.112405