Back to Search Start Over

A random walk sampling on knowledge graphs for semantic-oriented statistical tasks.

Authors :
Xu, Xiaoliang
Hong, Qifan
Wang, Yuxiang
Jin, Jiahui
Xuan, Xinle
Fu, Tao
Source :
Data & Knowledge Engineering. Jul2022, Vol. 140, pN.PAG-N.PAG. 1p.
Publication Year :
2022

Abstract

A knowledge graph (KG) manages large-scale and real-world facts as a big graph in a schema-flexible manner, which has recently attracted considerable attention. It is very common that users deploy some statistical tasks on a KG to achieve the latent information of interest. There are two types of the statistical tasks, that are, topology-oriented and semantic-oriented statistical tasks. Many efforts have been made for the former one (e.g., finding the average degree of a KG). The basic idea is concluded as: estimating an approximate statistical result based on a random sample collected through a topology-aware KG sampling approach. Unfortunately, this method cannot be directly deployed to support semantic-oriented statistical tasks (e.g., achieving the average fuel economy of cars produced in Germany), because the topology-aware sampling does not consider the semantics of a KG (or we say the sample is collected only based on the topological information of a KG, while excluding the semantics of a KG), hence leading to a low-quality random sample and would significantly affect the accuracy. In this paper, we propose a semantic-aware random walk sampling on KGs to quickly and accurately collect samples that match the semantic constraint of the semantic-oriented statistical task, and obtain an approximate statistical result by well-designed unbiased estimators. Moreover, we propose an optimization on our semantic-aware sampling to improve the sampling efficiency. Finally, extensive experiments were conducted on our method, which confirmed the effectiveness and efficiency of our approach. • We propose a semantic-aware sampling method on KG to collect high-quality samples. • We present unbiased estimators to estimate the statistical results based on samples. • We propose an optimized strategy to improve our semantic-aware sampling's efficiency. • We conduct extensive experiments to verify the superiority of our solution. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0169023X
Volume :
140
Database :
Academic Search Index
Journal :
Data & Knowledge Engineering
Publication Type :
Academic Journal
Accession number :
158565543
Full Text :
https://doi.org/10.1016/j.datak.2022.102024