Back to Search
Start Over
Understanding performance of distributed data-intensive applications
- Source :
- Philosophical transactions. Series A, Mathematical, physical, and engineering sciences. 368(1926)
- Publication Year :
- 2010
-
Abstract
- Grids, clouds and cloud-like infrastructures are capable of supporting a broad range of data-intensive applications. There are interesting and unique performance issues that appear as the volume of data and degree of distribution increases. New scalable data-placement and management techniques, as well as novel approaches to determine the relative placement of data and computational workload, are required. We develop and study a genome sequence matching application that is simple to control and deploy, yet serves as a prototype of a data-intensive application. The application uses a SAGA-based implementation of the All-Pairs pattern. This paper aims to understand some of the factors that influence the performance of this application and the interplay of those factors. We also demonstrate how the SAGA approach can enable data-intensive applications to be extensible and interoperable over a range of infrastructure. This capability enables us to compare and contrast two different approaches for executing distributed data-intensive applications—simple application-level data-placement heuristics versus distributed file systems.
- Subjects :
- Matching (statistics)
Theoretical computer science
Computer science
business.industry
General Mathematics
Distributed computing
Interoperability
General Engineering
Volume (computing)
General Physics and Astronomy
Cloud computing
computer.software_genre
Grid computing
Scalability
Data-intensive computing
Heuristics
business
computer
Subjects
Details
- ISSN :
- 1364503X
- Volume :
- 368
- Issue :
- 1926
- Database :
- OpenAIRE
- Journal :
- Philosophical transactions. Series A, Mathematical, physical, and engineering sciences
- Accession number :
- edsair.doi.dedup.....413b8d306262762a2e31ef663836c048