Back to Search Start Over

Efficient feature selection for mass spectrometry based electronic nose applications

Authors :
Oscar Eduardo Gualdron
Xavier Correig
Benachir Bouchikhi
Maria Vinaixa
R. Gómez
N. El-Barbri
Xavier Vilanova
Eduard Llobet
Jesus Brezmes
J.A. Carrasco
Source :
Digital.CSIC. Repositorio Institucional del CSIC, instname
Publication Year :
2007
Publisher :
Elsevier, 2007.

Abstract

High dimensionality is inherent to MS-based electronic nose applications where hundreds of variables per measurement (m/z fragments) - a significant number of them being highly correlated or noisy - are available. Feature selection is, therefore, an unavoidable pre-processing step if robust and parsimonious pattern classification models are to be developed. In this article, a new strategy for feature selection has been introduced and its good performance demonstrated using two MS e-nose databases. The feature selection is conducted in three steps. The first two steps are aimed at removing noisy, non-informative and highly collinear features (i.e., redundant), respectively. These two steps are computationally inexpensive and allow for dramatically reducing the number of variables (near 80% of initially available features are eliminated after the second step). The third step makes use of a stochastic variable selection method (simulated annealing) to further reduce the number of variables. For example, applying the method to an Iberian ham database has resulted in the number of features being reduced from 209 down to 14. Using the surviving m/z fragments, a fuzzy ARTMAP classifier was able to sort ham samples according to producer and quality (11-category classification) with a 97.24% success rate. The whole feature selection process runs in a few minutes in a Pentium IV PC platform. © 2006 Elsevier B.V. All rights reserved.<br />This work was funded in part by CICYT under project no. TIC2003-06301, by the Thematic Network in Metabolism and Nutrition ref. C03/08 and by AECI under project no. 39/04/P/E

Details

ISSN :
20030630
Database :
OpenAIRE
Journal :
Digital.CSIC. Repositorio Institucional del CSIC, instname
Accession number :
edsair.doi.dedup.....b124530f161e256e33e16b1b05202419