Back to Search Start Over

CrowdSJ: Skyline-Join Query Processing of Incomplete Datasets With Crowdsourcing

Authors :
Linlin Ding
Xiao Zhang
Hanlin Zhang
Liang Liu
Baoyan Song
Source :
IEEE Access, Vol 9, Pp 73216-73229 (2021)
Publication Year :
2021
Publisher :
IEEE, 2021.

Abstract

Skyline query is very useful in decision-making systems, WSN and so on. As a variation of skyline query, skyline-join query can return the results from multiple datasets. However, incomplete datasets are a frequent phenomenon due to the widespread use of automated information extraction and aggregation. Existing methods for dealing with incomplete data, such as probability, data padding can solve the problem, but cannot effectively reflect the real situation and are lack of integrality. Therefore, in this paper, in order to reflect the situation more accuracy and more user-centric, we research the problem of skyline-join query over incomplete datasets with crowdsourcing, named CrowdSJ. The crowdsourcing-based skyline-join query processing problem over incomplete datasets is divided into two situations. One is the skyline-join query only involves the unknown crowdsourcing attribute and the join attribute, named Partial Skyline-Join with Crowdsourcing (PSJCrowd). The other one is the skyline-join query involves all the attributes, named All Skyline-Join with Crowdsourcing (ASJCrowd). For PSJCrowd, first, we filter the known dataset. Then, we present the level-preference-tree-index, and propose the partial skyline-join with crowdsourcing algorithm. For ASJCrowd, first, we filter the known dataset too. Second, we build a level-preference-tree-index based on the known attributes of the incomplete dataset. Third, we propose the skyline-join with crowdsourcing on single dataset algorithm, CrowdSJ-single, to filter the dataset containing unknown attributes. Then, we build a global level-preference-tree-index based on the known attributes of the incomplete dataset and the complete dataset. We propose the skyline-join with crowdsourcing on multiple datasets algorithm, CrowdSJ-multiple. We filter the linked tuples based on the global level-preference-tree-index and the results of each round of crowdsourcing. Numerous experiments on synthetic and real datasets demonstrate that our algorithms are efficient and effective.

Details

Language :
English
ISSN :
21693536
Volume :
9
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.19d247ddd14600b7e4a3bbe22e95ca
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2021.3079324