1. 基于互信息和融合加权的并行深度森.
- Author
-
毛伊敏 and 李文豪
- Abstract
In the context of big data environments, the parallel deep forest algorithm faces several challenges, such as an abundance of irrelevant and redundant features, imbalanced multi-granularity scanning, inadequate classification performance, and low parallelization efficiency. To tackle these issues, it proposed a parallel deep forest algorithm based on mutual information and mixed weighting (PDF-MIMW). Firstly, the algorithm introduced a feature extraction strategy based on mutual information (FE-MI) in the phase of dimensionality reduction, which filters the original feature set by combining feature importance, interaction, and redundancy metrics, thereby eliminating excessive irrelevant and redundant features. Next, the algorithm proposed an improved multi-granularity scanning strategy based on padding (IMGS-P) in the phase of multi-granularity scanning, which involves padding the reduced features and performing random sampling on the subsequences obtained after window scanning, thereby ensuring a balanced multi-granularity scanning process. Then, the algorithm put forth the sub-forest construction strategy based on mixed weighting (SFCMW) in the phase of cascade forest construction, which utilizes the Spark framework to parallelly construct weighted sub-forests, thereby enhancing the model’s classification performance. Finally, the algorithm designed a load balancing strategy based on a mixed particle swarm algorithm in the phase of class vector merging, which optimizes the load distribution among task nodes in the Spark framework, reducing the waiting time during class vector merging and improving the parallelization efficiency of the model. Experiments demonstrate that the PDF-MIMW algorithm achieves superior classification performance and higher training efficiency in the big data environment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF