Start Over

Data analytics in an Internet of Things edge cloud setting

Authors :: Erhan, Laura
Publication Year :: 2022
Publisher :: University of Derby, 2022.
Abstract: Over the past years, the Internet of Things (IoT) has vastly expanded with a multitude of devices now monitoring, sensing, and acting on the surrounding environment. This in turn creates a large amount of data to be processed and analysed in order to gain insight into specific problems. The development of computationally powerful IoT devices allows for the processing pipeline to start close to the data collection points, namely at the Edge of a system, and for it to continue if needed, all the way up to the Cloud, where heavy processing can be undertaken. We investigate how machine learning techniques can be used to take advantage of the Edge by pushing computation to smaller devices such as the Raspberry Pi, and how IoT data analytics can be obtained both with the help of the Edge and the Cloud. This thesis revolves around three main directions for IoT data analytics. Firstly, we discuss anomaly detection, an important theoretical and practical problem, due to its broad set of application domains, ranging from data analysis to industrial automation. Herein, we review state-of-the-art methods that may be employed to detect anomalies in the specific area of sensor systems. In this context, anomaly detection is a particularly hard problem, given the need to find computational-energy-accuracy trade-offs in a constrained environment. We taxonomize methods ranging from conventional techniques (statistical methods, time-series analysis, signal processing, etc.) to data-driven techniques (supervised learning, reinforcement learning, deep learning, etc.). We also look at the impact that different architectural environments (Cloud, Fog, Edge) can have on the sensors ecosystem. Secondly, we advocate for the use of machine learning at the sensor nodes to perform data imputation, an essential data-cleaning operation, in order to avoid the transmission of corrupted (and often unusable) data to the Cloud. Starting from a publicly available pollution dataset, we investigate how two machine learning techniques (kNN and missForest) compare against two statistical based techniques (mean and MICE) and how these can be embedded on a Raspberry Pi to perform data imputation in real-time, without affecting the data collection process. The experimental results provide details of the accuracy and execution times, while demonstrating the accuracy and computational efficiency of edge-learning methods for filling in missing data values in corrupted data series. Finally, we present a study case which is representative for smart cities and IoT analytics in an Edge Cloud setting. A field experiment aiming to better understand the interactions between citizens and urban green spaces was carried out in Sheffield, U.K., which involved 1870 participants for two different time periods (7 and 30 days). Objective (sensor information) and subjective data (direct input from the users) was collected via a smartphone app. Location data from green spaces was complemented by textual and photographic information provided by the users. With the use of data science and machine learning techniques, we identify the main features observed by the citizens through both text and images, the time that people spent in parks, as well as the top interaction areas. This allows us to gain an overview of certain patterns and the behaviour of the citizens within their surroundings and proves the capabilities of integrating technology into large-scale social studies.