1. Dynamic Hot Data Identification Using a Stack Distance Approximation
- Author
-
Hyeonji Ha, Daeun Shim, Hyeyin Lee, and Dongchul Park
- Subjects
General Computer Science ,Computational complexity theory ,Computer science ,hot data ,Computation ,Big data ,Hash function ,Word error rate ,02 engineering and technology ,Bloom filter ,flash memory ,stack distance ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,SSD ,business.industry ,hot data identification ,Search engine indexing ,General Engineering ,Approximation algorithm ,020207 software engineering ,TK1-9971 ,Computer engineering ,Electrical engineering. Electronics. Nuclear engineering ,Cache ,business - Abstract
Though various applications such as flash memory, cache, storage systems, and even indexing for enterprise big data search, adopt hot data identification schemes, relatively little research has been expended into holistically examining alternative strategies. Rather, researchers tend to classify hot data simplistically by considering one or more frequency metrics, thereby disregarding recency, which is also an important consideration. In practice, different workloads mandate different treatment to achieve effective hot data decisions. This paper proposes a dynamic hot data identification scheme that adopts a workload stack distance approximation. Stack distance is a good recency measure, but it traditionally requires high computational complexity as well as additional space. Since stack distance calculation efficiency is a core component for our dynamic feature design, this paper additionally proposes a stack distance approximation algorithm that significantly reduces both computation and space requirements. To our knowledge, the proposed scheme is the first dynamic hot data identification scheme which judiciously assigns more weight to either recency or frequency based on workload characteristics. Our experiments with diverse realistic workloads demonstrate that our stack distance approximation achieves excellent accuracy (up to a 0.1% error rate) and our dynamic scheme improves performance by as much as 49.8%.
- Published
- 2021