1. Streaming Data Reorganization at Scale with DeltaFS Indexed Massive Directories
- Author
-
Charles D. Cranor, Garth A. Gibson, Bradley W. Settlemyer, George Amvrosiadis, Gregory R. Ganger, Qing Zheng, Ankush Jain, and Gary Grider
- Subjects
Multi-core processor ,Data processing ,Speedup ,Computer science ,business.industry ,Search engine indexing ,020206 networking & telecommunications ,Memory bandwidth ,02 engineering and technology ,Computational science ,Hardware and Architecture ,Analytics ,020204 information systems ,Server ,0202 electrical engineering, electronic engineering, information engineering ,business ,Data compression - Abstract
Complex storage stacks providing data compression, indexing, and analytics help leverage the massive amounts of data generated today to derive insights. It is challenging to perform this computation, however, while fully utilizing the underlying storage media. This is because, while storage servers with large core counts are widely available, single-core performance and memory bandwidth per core grow slower than the core count per die. Computational storage offers a promising solution to this problem by utilizing dedicated compute resources along the storage processing path. We present DeltaFS Indexed Massive Directories (IMDs), a new approach to computational storage. DeltaFS IMDs harvest available (i.e., not dedicated) compute, memory, and network resources on the compute nodes of an application to perform computation on data. We demonstrate the efficiency of DeltaFS IMDs by using them to dynamically reorganize the output of a real-world simulation application across 131,072 CPU cores. DeltaFS IMDs speed up reads by 1,740× while only slightly slowing down the writing of data during simulation I/O for in situ data processing.
- Published
- 2020