1. Improving data latency in ETL with filtering algorithms for stream processing an experimental setup.
- Author
-
Sabtu, Adilah, Bahari, Mahadi, Azmi, Nurulhuda Firdaus Mohd, Ali, Nor Azizah, Sulaiman, Zuraidah, and Fauzi, Nur Aqidah Mohd
- Subjects
- *
BIG data , *ALGORITHMS , *SCALABILITY , *LANDSLIDES , *LIDAR , *DATA modeling - Abstract
The Extract, Transform and Load (ETL) system is a standard approach of managing and sustaining movement and transaction of data assets. The exponential growth of data necessitates the development of better approaches and ways of surpassing the traditional ETL limits and effectively manage these big data in near real-time, in terms of availability, speed of delivery or latency and scalability in processing, thus enhance the functionality and increase data value. The purpose of this research is to identify approaches and series of combination of implementing ETL for streaming and near real-time data with special focus on data latency. It is important to note that while the approaches might improve latency, it is also possible that other aspects such as availability and scalability could be compromised, as well as other limitations. The research then presents an experimental setup to simulate a near real-time scenario for landslide mapping using Lidar data, introduces 3 sets of algorithms combo: (1) data model transformation, (2) alert threshold (3) query filter techniques in stream processing with the aim of achieving low latency. Furthermore, the research expects to contribute some modified filtering techniques for accelerating queries for prioritized datasets and reduce overall data filtering time. The experiment will be performed on a Hadoop platform. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF