1. A modified depth function for outlier detection in multivariate data with applications
- Author
-
Abedin, Md Jaynal and Newell, John
- Subjects
Data-depth ,Engineering ,Computer Science ,outlier detection ,topic modelling ,Mathematics, Statistics and Applied Mathematics ,Statistiscs ,Mathematics ,Data science - Abstract
Data Science is the new and exciting interdisciplinary response that has emerged as a consequence of the staggering amounts of data generated in many new forms from digital images to audio to text. It is an interdisciplinary field involving Statistics, Computer Science and Mathematics. It involves the study of data, how they are collected, stored, accessed, visualised, modelled and ultimately used to inform decision making by turning data into intelligence. Despite this ’data revolution’ and the development of Data Science as a consequence, the aim of any data analysis is still the same, to make inference about unknown population parameters using sample statistics. One fundamental challenge in inference is the identification of outliers. Such oddities, or atypical observations, could be indicative of poor data management or biased sampling. In this situation the presence of such outliers are considered a negative aspect and efforts are needed to account for them (e.g. correct data entry errors) accordingly to avoid introducing bias in parameter estimation. On the other hand, finding an outlier may be the key focus of the exercise as an outlier may represent something new and novel. Many statistical methods have been developed to identify outlying data points and robust methods developed to account for outliers in statistical models. A central property of all such methods is that an observation is classified as an outlier or not (i.e. a binary decision); being able to quantify an observations ’outlyingness’ is clearly an attractive alternative. In this thesis, a novel method is presented for outlier detection in multivariate data based on the idea of a statistical depth function. The proposed approach enables outlier detection in multivariate data while taking into consideration the local geometry of the underlying probability distribution. 2021-09-14
- Published
- 2020