Back to Search Start Over

New Splitting Criteria for Decision Trees in Stationary Data Streams.

Authors :
Jaworski, Maciej
Duda, Piotr
Rutkowski, Leszek
Source :
IEEE Transactions on Neural Networks & Learning Systems. Jun2018, Vol. 29 Issue 6, p2516-2529. 14p.
Publication Year :
2018

Abstract

The most popular tools for stream data mining are based on decision trees. In previous 15 years, all designed methods, headed by the very fast decision tree algorithm, relayed on Hoeffding’s inequality and hundreds of researchers followed this scheme. Recently, we have demonstrated that although the Hoeffding decision trees are an effective tool for dealing with stream data, they are a purely heuristic procedure; for example, classical decision trees such as ID3 or CART cannot be adopted to data stream mining using Hoeffding’s inequality. Therefore, there is an urgent need to develop new algorithms, which are both mathematically justified and characterized by good performance. In this paper, we address this problem by developing a family of new splitting criteria for classification in stationary data streams and investigating their probabilistic properties. The new criteria, derived using appropriate statistical tools, are based on the misclassification error and the Gini index impurity measures. The general division of splitting criteria into two types is proposed. Attributes chosen based on type- $I$ splitting criteria guarantee, with high probability, the highest expected value of split measure. Type- $II$ criteria ensure that the chosen attribute is the same, with high probability, as it would be chosen based on the whole infinite data stream. Moreover, in this paper, two hybrid splitting criteria are proposed, which are the combinations of single criteria based on the misclassification error and Gini index. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
2162237X
Volume :
29
Issue :
6
Database :
Academic Search Index
Journal :
IEEE Transactions on Neural Networks & Learning Systems
Publication Type :
Periodical
Accession number :
129655411
Full Text :
https://doi.org/10.1109/TNNLS.2017.2698204