1. MapReduce-Based Improved Random Forest Model for Massive Educational Data Processing and Classification
- Author
-
Vinh Truong Hoang and Wei Xu
- Subjects
Computer Networks and Communications ,Computer science ,business.industry ,Big data ,Data classification ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,Bottleneck ,Data modeling ,Random forest ,Weighting ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,Distributed File System ,business ,computer ,Reference model ,Software ,Information Systems - Abstract
This paper takes education data mining as the research theme, mine the existing massive education big data, compares the analysis methods of existing data models, and proposes an improved random forest reference model. The information gain of various features is calculated by introducing the feature weighting system, and the evaluation index is used to improve the existing data analysis. The simulation results show that the improved model is highly efficient as compared to the existing models for classification. In order to resolve the performance bottleneck of a single node in multiple data classification tasks in the era of big data, a classification and prediction model of graduates’ large-scale employment data, based on distributed improved RF algorithm, is proposed. The MapReduce distributed computing framework is used to complete the serial writing and deserialization loading of the training model between the local disk and the distributed file system, and realizing the distributed expansion of the large-scale data classification model based on the improved RF model.
- Published
- 2021