1. Optimal placement for repair-efficient erasure codes in geo-diverse storage centres.
- Author
-
Mohan, Lakshmi J., Rajawat, Ketan, Parampalli, Udaya, and Harwood, Aaron
- Subjects
- *
LINEAR network coding , *DATA warehousing , *BENEFIT performances , *CIPHERS , *STORAGE - Abstract
Erasure codes are increasingly being used by storage providers to reduce the cost of reliably storing large volumes of data. As compared to the default mechanism of triple replication, erasure codes result in optimal storage efficiency, but require significant network and disk usage during repair of failed data. The repair process is particularly complicated for storage clusters with data centres spread across a wide geographical area. For such geo-diverse clusters, the recovery performance of the code used depends more on the network throughput and latency than on the computations required for decoding. Hence, the recovery performance of most erasure codes can be improved if the surviving blocks for effecting a node repair are placed optimally. This article affirms the idea by proposing an optimization framework for placement of blocks in geo-distributed storage clusters, addressing the open problem posed by Dimakis et al. in their celebrated paper on network coding. To this end, a signomial program is formulated that yields the optimal placement minimizing the average single-block repair cost over large number of files. Though non-convex, the structure of the problem allows us to use a monomial approximation to solve the problem efficiently. MATLAB simulation results and equivalent translation to implementation with popular codes used in Hadoop storage setting are presented, that validate our framework. The idea could be applied to any coded geo-diverse storage system to achieve significant benefits in repair performance during node failures. • A novel framework for optimal placement of storage codes is put forth. Different from traditional block placements that rely on simple heuristics to place blocks, we formulate the optimal block placement problem for a geo-diverse cluster. While the resulting problem is non-convex and difficult to work with, we propose an approximate algorithm for solving it in efficient and scalable manner. • The framework is simulated with popular erasure codes using MATLAB CVX toolbox to get promising results. The model is implemented, translating it into a Hadoop setting and it is observed that the results validate the modelling. • A set of enhancements to the proposed framework are discussed considering changes to network topology and storage cost, that make it adaptable to more generic scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF