Back to Search
Start Over
Complete Random Forest Based Class Noise Filtering Learning for Improving the Generalizability of Classifiers.
- Source :
-
IEEE Transactions on Knowledge & Data Engineering . Nov2019, Vol. 31 Issue 11, p2063-2078. 16p. - Publication Year :
- 2019
-
Abstract
- The existing noise detection methods required the classifiers or distance measurements or data overall distribution, and 'curse of dimensionality' and other restrictions made them insufficiently effective in complex data, e.g., different attribute weights, high-dimensionality, containing feature noise, nonlinearity, etc. This is also the main reason that the existing noise filtering methods were not widely applied and formed an effective learning framework. To address this problem, we propose here a complete and efficient random forest method (CRF) specifically for the class noise detection by simulating the grid generation and expansion. The CRF is not based on distance measures or overall distribution or classifiers; besides, the voting mechanism makes it able to effectively process datasets containing feature noise. Furthermore, we introduce CRF based class noise filtering learning framework (CRF-NFL) and derive its mathematical model. The framework is then applied to many widely used classifiers including some state-of-the-art algorithms, e.g., k-means tree, GBDT, and XGBoost. Moreover, its parallelized is designed for large-scale data. The CRF-NFL show much better generalizability than the conventional classifiers and the relative density-based method, which is the most effective noise filtering method as far as we know. All research has formed an open source library, called CRF-NFL: http://www.cquptshuyinxia.com/CRF-NFL.html. [ABSTRACT FROM AUTHOR]
- Subjects :
- *NOISE
*DATA distribution
*NOISE measurement
Subjects
Details
- Language :
- English
- ISSN :
- 10414347
- Volume :
- 31
- Issue :
- 11
- Database :
- Academic Search Index
- Journal :
- IEEE Transactions on Knowledge & Data Engineering
- Publication Type :
- Academic Journal
- Accession number :
- 139076918
- Full Text :
- https://doi.org/10.1109/TKDE.2018.2873791