1. Dependency maximization forward feature selection algorithms based on normalized cross-covariance operator and its approximated form for high-dimensional data.
- Author
-
Xu, Jianhua, Lu, Wenkai, Li, Jun, and Yuan, Hongli
- Subjects
- *
FEATURE selection , *MATRIX inversion , *ALGORITHMS , *COLUMNS , *COMPUTATIONAL complexity , *EYE drops , *VIDEO coding - Abstract
• A kernel-based feature selection algorithm is proposed, based on primary normalized cross-covariance operator (NOCCO). • A fast version is implemented via combining approximated NOCCO with column recursive method for Moore-Penrose inverse matrix. • Their effectiveness and efficiency are validated using detailed experimental comparison and analysis for nine benchmark data sets. Supervised feature selection (FS) for classification aims at finding a more discriminative subset from original features, to facilitate classifier training, improve classification performance and enhance model interpretability. Primary normalized cross-covariance operator (NOCCO) is a nonlinear kernel-based dependency measure between features and labels, including two inverse matrices, and its approximated version (ANOCCO) is simplified via exploiting linear kernel for features, delta kernel for labels and Moore–Penrose inverse. In this paper, we apply NOCCO and ANOCCO to FS task. According to sequential forward selection, a forward NOCCO-based FS algorithm (i.e., FoNOCCO) is directly implemented via various accelerating strategies, but its computational complexity is extremely high. To this end, we propose its fast version via maximizing ANOCCO (i.e., FoANOCCO), where each candidate feature is evaluated efficiently according to column recursive algorithm for Moore–Penrose inverse. Theoretical analysis shows that evaluating a candidate feature needs a time complexity of O (N 2) in FoANOCCO and O (N 3) in FoNOCCO, where N is the number of instances. On eight small-size high-dimensional benchmark data sets, our detailed experiments illustrate that our two forward FS algorithms are superior to six state-of-the-art FS methods with four baseline classifiers, and FoANOCCO runs 70 times faster than FoNOCCO with a 1.79 % accuracy drop. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF