Back to Search
Start Over
Two-layer partitioned and deletable deep bloom filter for large-scale membership query.
- Source :
-
Information Systems . Oct2023, Vol. 119, pN.PAG-N.PAG. 1p. - Publication Year :
- 2023
-
Abstract
- The recently proposed Learned Bloom Filter (LBF) provides a new perspective on large-scale membership queries by using machine learning to replace the traditional bloom filter. However, reducing the false positive rate (FPR) of the learned model with small memory usage, and supporting deletion efficiently become the new issues. In this paper, we propose a novel Two-layer Partitioned and Deletable Deep Bloom Filter (PDDBF) for large-scale membership query, which can reduce the FPR with small memory usage and support deletion efficiently. The proposed PDDBF consists of three main parts: (1) Data partition. To improve the classification accuracy of the learned model, the K-means cluster with the elbow method is used for the data partition. (2) Deep Bloom Filter. To reduce the FPR, deep learning models are used to construct multiple independent learning mechanisms, which correspond to the clusters obtained by part1. (3) Partitioned backup filter. To support deletion under the premise of ensuring low FPR and reducing query time consumption, combine the perfect hash (PH) table and counting bloom filters (CBFs) on the basis of the partition bloom filter. Experiments show that the proposed PDDBF reduces the FPR 87.13% with the same memory usage compared with the state-of-the-art PLBF on real-world URLs data set. Moreover, the PDDBF reduces the FPR 99.68% with the same memory usage and reduces the query time consumption to 2.61x that of the PLBF after data deletion, respectively. • A novel PDDBF model is proposed for large-scale membership query. • A data partition part is devised to improve the classification accuracy of the learned model. • Multiple deep bloom filters are designed to reduce false positive rate. • The PH table and the partitioned CBFs are combined to support deletion efficiently. • The false positive rate of the model is low and can efficiently support data deletion. [ABSTRACT FROM AUTHOR]
- Subjects :
- *DEEP learning
*K-means clustering
*ELBOW
*MACHINE learning
Subjects
Details
- Language :
- English
- ISSN :
- 03064379
- Volume :
- 119
- Database :
- Academic Search Index
- Journal :
- Information Systems
- Publication Type :
- Academic Journal
- Accession number :
- 173156058
- Full Text :
- https://doi.org/10.1016/j.is.2023.102267