Back to Search Start Over

Understanding performance of distributed data-intensive applications

Authors :
Michael V. Miceli
Shantenu Jha
Christopher Miceli
Bety Rodriguez-Milla
Source :
Philosophical transactions. Series A, Mathematical, physical, and engineering sciences. 368(1926)
Publication Year :
2010

Abstract

Grids, clouds and cloud-like infrastructures are capable of supporting a broad range of data-intensive applications. There are interesting and unique performance issues that appear as the volume of data and degree of distribution increases. New scalable data-placement and management techniques, as well as novel approaches to determine the relative placement of data and computational workload, are required. We develop and study a genome sequence matching application that is simple to control and deploy, yet serves as a prototype of a data-intensive application. The application uses a SAGA-based implementation of the All-Pairs pattern. This paper aims to understand some of the factors that influence the performance of this application and the interplay of those factors. We also demonstrate how the SAGA approach can enable data-intensive applications to be extensible and interoperable over a range of infrastructure. This capability enables us to compare and contrast two different approaches for executing distributed data-intensive applications—simple application-level data-placement heuristics versus distributed file systems.

Details

ISSN :
1364503X
Volume :
368
Issue :
1926
Database :
OpenAIRE
Journal :
Philosophical transactions. Series A, Mathematical, physical, and engineering sciences
Accession number :
edsair.doi.dedup.....413b8d306262762a2e31ef663836c048