Back to Search Start Over

Finding relevant information in big datasets with ML

Authors :
Universitat Politècnica de Catalunya. Doctorat en Computació
Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Services, Information and Data Engineering
Njoku, Uchechukwu Fortune
Abelló Gamazo, Alberto
Bilalli, Besim
Bontempi, Gianluca
Universitat Politècnica de Catalunya. Doctorat en Computació
Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Services, Information and Data Engineering
Njoku, Uchechukwu Fortune
Abelló Gamazo, Alberto
Bilalli, Besim
Bontempi, Gianluca
Publication Year :
2024

Abstract

Due to the abundance of data, noisy, irrelevant, or redundant features often need to be identified and discarded. Feature selection is a collection of methods used to ensure that only relevant data are used for a data analysis task. Extracting and using only useful data for analysis promotes model understanding and performance and reduces the model training time and variance, i.e., overfitting. There is an abundance of methods for feature selection, and they can be categorised by various perspectives and are applicable to differing use cases. In this tutorial, we introduce the feature selection problem and present it from three perspectives of categorisation: search strategy, model reliance, and relevance definition. Furthermore, we propose a guideline for the use of the various methods. Lastly, we discuss current challenges and opportunities for research on feature selection.<br />The project leading to this publication has received funding from 2020 research and innovation programme (grant agreement No 955895).<br />Peer Reviewed<br />Postprint (published version)

Details

Database :
OAIster
Notes :
4 p., application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1452496822
Document Type :
Electronic Resource