Back to Search
Start Over
D$^{3}$ : A Dynamic Dual-Phase Deduplication Framework for Distributed Primary Storage
- Source :
- IEEE Transactions on Computers. 67:193-207
- Publication Year :
- 2018
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2018.
-
Abstract
- Deploying deduplication for distributed primary storage is a sophisticated and challenging task, considering that the demands of low read/write latency, stable read/write performance, and efficient space saving are all of paramount importance. Unfortunately, existing schemes cannot present a satisfactory solution for the aforementioned requirements simultaneously. In this article, we propose D $^{3}$ , a dynamic dual-phase deduplication framework for distributed primary storage. Several major innovations are established in D $^{3}$ . First, we formulate a deduplication-oriented taxonomy called Dedup-Type , to group data with similar deduplication-related characteristics into larger categories. It serves as coarse-grained filter and one of the prioritizing references in D $^{3}$ . Second, D $^{3}$ is a dual-phase frameworkâinline-phase and offline-phase deduplication processes work in concert with each other. Third, D $^{3}$ operates in a dynamic manner. We design two critical mechanisms: context-aware threshold adjustment (CTA) for local inline-phase deduplication, and deferred priority-based enforcement (DPE) for global offline-phase deduplication. The CTA mechanism enables selective deduplication under a periodically updated threshold. Data skipped during the inline phase is regarded as a candidate for offline phase, and is handled in a prioritized order under the governance of DPE mechanism. Evaluation results demonstrate that, compared with conventional inline and offline deduplication schemes, D $^{3}$ achieves more efficient and stabler read/write performance with competitive space saving.
- Subjects :
- Computational Theory and Mathematics
Hardware and Architecture
Computer science
Data_FILES
0202 electrical engineering, electronic engineering, information engineering
Data deduplication
020206 networking & telecommunications
02 engineering and technology
Parallel computing
Software
020202 computer hardware & architecture
Theoretical Computer Science
Subjects
Details
- ISSN :
- 00189340
- Volume :
- 67
- Database :
- OpenAIRE
- Journal :
- IEEE Transactions on Computers
- Accession number :
- edsair.doi...........6e5f6c0888e9fd81d6f53f344e6618b6