Back to Search Start Over

Variable selection and neural networks for high-dimensional data analysis : application in infrared spectroscopy and chemometrics

Authors :
UCL - FSA/ELEC - Département d'électricité
Verleysen, Michel
Meurens, Marc
Wertz, Vincent
De Herde, André
Saerens, Marco
Rossi, Fabrice
Benoudjit, Nabil
UCL - FSA/ELEC - Département d'électricité
Verleysen, Michel
Meurens, Marc
Wertz, Vincent
De Herde, André
Saerens, Marco
Rossi, Fabrice
Benoudjit, Nabil
Publication Year :
2003

Abstract

This thesis focuses particularly on the application of chemometrics in the field of analytical chemistry. Chemometrics (or multivariate analysis) consists in finding a relationship between two groups of variables, often called dependent and independent variables. In infrared spectroscopy for instance, chemometrics consists in the prediction of a quantitative variable (the obtention of which is delicate, requiring a chemical analysis and a qualified operator), such as the concentration of a component present in the studied product from spectral data measured on various wavelengths or wavenumbers (several hundreds, even several thousands). In this research we propose a methodology in the field of chemometrics to handle the chemical data (spectrophotometric data) which are often in high dimension. To handle these data, we first propose a new incremental method (step-by-step) for the selection of spectral data using linear and non-linear regression based on the combination of three principles: linear or non-linear regression, incremental procedure for the variable selection, and use of a validation set. This procedure allows on one hand to benefit from the advantages of non-linear methods to predict chemical data (there is often a non-linear relationship between dependent and independent variables), and on the other hand to avoid the overfitting phenomenon, one of the most crucial problems encountered with non-linear models. Secondly, we propose to improve the previous method by a judicious choice of the first selected variable, which has a very important influence on the final performances of the prediction. The idea is to use a measure of the mutual information between the independent and dependent variables to select the first one; then the previous incremental method (step-by-step) is used to select the next variables. The variable selected by mutual information can have a good interpretation from the spectrochemical point of view, and does not depend on the data di<br />(FSA 3)--UCL, 2003

Details

Database :
OAIster
Notes :
English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1130587276
Document Type :
Electronic Resource