Back to Search
Start Over
Design and analyses of web scraping on burstable virtual machines.
- Source :
- Concurrency & Computation: Practice & Experience; 4/25/2024, Vol. 36 Issue 9, p1-13, 13p
- Publication Year :
- 2024
-
Abstract
- Summary: Web scraping is a widely used technique for decision‐making, collecting, and structuring public data from the internet. As the volume of data continues to grow, the need for more efficient methods of data extraction becomes crucial. This article introduces a novel web scraping framework that utilizes Burstable virtual machines (VMs) on Amazon Web Services with the objective of reducing the monetary cost of execution while ensuring compliance with service level agreements (SLAs). To achieve this, the framework utilizes a combination of fixed and temporary Burstable VMs in a mixed cluster, which can be elastically scaled up to fulfill the SLA and scaled down to minimize monetary costs. Two strategies for handling VM allocation are proposed and evaluated: (i) a queue and SLA‐based strategy that employs queue size information and SLA criteria to determine the required number of VMs for the current scraping requests, and (ii) a credit‐based strategy that incorporates information about Burstable VM credits to effectively manage instance creation and termination. Experimental tests show that the proposed framework meets the defined SLA while achieving cost reductions of up to 74% compared to an approach that executes on fixed‐size clusters of Burstable instances. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 15320626
- Volume :
- 36
- Issue :
- 9
- Database :
- Complementary Index
- Journal :
- Concurrency & Computation: Practice & Experience
- Publication Type :
- Academic Journal
- Accession number :
- 176213993
- Full Text :
- https://doi.org/10.1002/cpe.7999