1. Small margin ensembles can be robust to class-label noise
- Author
-
Alberto Suárez, Maryam Sabzevari, Gonzalo Martínez-Muñoz, UAM. Departamento de Ingeniería Informática, and Aprendizaje Automático (ING EPS-001)
- Subjects
Informática ,Small margin classifiers ,Training set ,business.industry ,Computer science ,Cognitive Neuroscience ,Contrast (statistics) ,Pattern recognition ,Base (topology) ,Class (biology) ,Computer Science Applications ,Random forest ,Noise ,Label noise ,Artificial Intelligence ,Margin (machine learning) ,Bagging ,Range (statistics) ,Artificial intelligence ,business ,Bootstrapping (statistics) ,Bootstrap sampling - Abstract
This is the author’s version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neurocomputing, VOL 160 (2015) DOI 10.1016/j.neucom.2014.12.086, Subsampling is used to generate bagging ensembles that are accurate and robust to class-label noise. The effect of using smaller bootstrap samples to train the base learners is to make the ensemble more diverse. As a result, the classification margins tend to decrease. In spite of having small margins, these ensembles can be robust to class-label noise. The validity of these observations is illustrated in a wide range of synthetic and real-world classification tasks. In the problems investigated, subsampling significantly outperforms standard bagging for different amounts of class-label noise. By contrast, the effectiveness of subsampling in random forest is problem dependent. In these types of ensembles the best overall accuracy is obtained when the random trees are built on bootstrap samples of the same size as the original training data. Nevertheless, subsampling becomes more effective as the amount of class-label noise increases., The authors acknowledge financial support from Spanish Plan Nacional I+D+i Grant TIN2013-42351-P and from Comunidad de Madrid Grant S2013/ICE-2845 CASI-CAM-CM.
- Published
- 2015
- Full Text
- View/download PDF