Back to Search Start Over

Improving data latency in ETL with filtering algorithms for stream processing an experimental setup.

Authors :
Sabtu, Adilah
Bahari, Mahadi
Azmi, Nurulhuda Firdaus Mohd
Ali, Nor Azizah
Sulaiman, Zuraidah
Fauzi, Nur Aqidah Mohd
Source :
AIP Conference Proceedings. 2024, Vol. 2991 Issue 1, p1-7. 7p.
Publication Year :
2024

Abstract

The Extract, Transform and Load (ETL) system is a standard approach of managing and sustaining movement and transaction of data assets. The exponential growth of data necessitates the development of better approaches and ways of surpassing the traditional ETL limits and effectively manage these big data in near real-time, in terms of availability, speed of delivery or latency and scalability in processing, thus enhance the functionality and increase data value. The purpose of this research is to identify approaches and series of combination of implementing ETL for streaming and near real-time data with special focus on data latency. It is important to note that while the approaches might improve latency, it is also possible that other aspects such as availability and scalability could be compromised, as well as other limitations. The research then presents an experimental setup to simulate a near real-time scenario for landslide mapping using Lidar data, introduces 3 sets of algorithms combo: (1) data model transformation, (2) alert threshold (3) query filter techniques in stream processing with the aim of achieving low latency. Furthermore, the research expects to contribute some modified filtering techniques for accelerating queries for prioritized datasets and reduce overall data filtering time. The experiment will be performed on a Hadoop platform. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0094243X
Volume :
2991
Issue :
1
Database :
Academic Search Index
Journal :
AIP Conference Proceedings
Publication Type :
Conference
Accession number :
177782053
Full Text :
https://doi.org/10.1063/5.0212940