Back to Search Start Over

Two-layer partitioned and deletable deep bloom filter for large-scale membership query.

Authors :
Zeng, Meng
Zou, Beiji
Zhang, Wensheng
Yang, Xuebing
Kong, Guilan
Kui, Xiaoyan
Zhu, Chengzhang
Source :
Information Systems. Oct2023, Vol. 119, pN.PAG-N.PAG. 1p.
Publication Year :
2023

Abstract

The recently proposed Learned Bloom Filter (LBF) provides a new perspective on large-scale membership queries by using machine learning to replace the traditional bloom filter. However, reducing the false positive rate (FPR) of the learned model with small memory usage, and supporting deletion efficiently become the new issues. In this paper, we propose a novel Two-layer Partitioned and Deletable Deep Bloom Filter (PDDBF) for large-scale membership query, which can reduce the FPR with small memory usage and support deletion efficiently. The proposed PDDBF consists of three main parts: (1) Data partition. To improve the classification accuracy of the learned model, the K-means cluster with the elbow method is used for the data partition. (2) Deep Bloom Filter. To reduce the FPR, deep learning models are used to construct multiple independent learning mechanisms, which correspond to the clusters obtained by part1. (3) Partitioned backup filter. To support deletion under the premise of ensuring low FPR and reducing query time consumption, combine the perfect hash (PH) table and counting bloom filters (CBFs) on the basis of the partition bloom filter. Experiments show that the proposed PDDBF reduces the FPR 87.13% with the same memory usage compared with the state-of-the-art PLBF on real-world URLs data set. Moreover, the PDDBF reduces the FPR 99.68% with the same memory usage and reduces the query time consumption to 2.61x that of the PLBF after data deletion, respectively. • A novel PDDBF model is proposed for large-scale membership query. • A data partition part is devised to improve the classification accuracy of the learned model. • Multiple deep bloom filters are designed to reduce false positive rate. • The PH table and the partitioned CBFs are combined to support deletion efficiently. • The false positive rate of the model is low and can efficiently support data deletion. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
03064379
Volume :
119
Database :
Academic Search Index
Journal :
Information Systems
Publication Type :
Academic Journal
Accession number :
173156058
Full Text :
https://doi.org/10.1016/j.is.2023.102267