101. An empirical comparison of ensemble methods based on classification trees.
- Author
-
Hamza, Mounir and Larocque, Denis
- Subjects
- *
CLASSIFICATION , *ALGORITHMS , *STATISTICAL correlation , *MATHEMATICAL statistics , *PERMUTATIONS - Abstract
In this paper, we perform an empirical comparison of the classification error of several ensemble methods based on classification trees. This comparison is performed by using 14 data sets that are publicly available and that were used by Lim, Loh and Shih [Lim, T., Loh, W. and Shih, Y.-S., 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning , 40, 203–228.]. The methods considered are a single tree, Bagging, Boosting (Arcing) and random forests (RF). They are compared from different perspectives. More precisely, we look at the effects of noise and of allowing linear combinations in the construction of the trees, the differences between some splitting criteria and, specifically for RF, the effect of the number of variables from which to choose the best split at each given node. Moreover, we compare our results with those obtained by Lim et al . [Lim, T., Loh, W. and Shih, Y.-S., 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning , 40, 203–228.]. In this study, the best overall results are obtained with RF. In particular, RF are the most robust against noise. The effect of allowing linear combinations and the differences between splitting criteria are small on average, but can be substantial for some data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF