Back to Search
Start Over
Rocchio algorithm-based particle initialization mechanism for effective PSO classification of high dimensional data
- Source :
- Swarm and Evolutionary Computation. 34:18-32
- Publication Year :
- 2017
- Publisher :
- Elsevier BV, 2017.
-
Abstract
- In recent years, there has been a growing interest in applying Particle Swarm Optimization (PSO) to data classification. Nonetheless, due to the curse of dimensionality, the effectiveness of the PSO applied to high dimensional data classification becomes questionable. This paper proposes a novel specialized PSO initialization mechanism, developed specifically for PSO applications to high dimensional data classification. The proposed initialization mechanism is inspired by the center-based sampling theory, which argues that the center of the search space is a promising region for the initialization step in evolutionary algorithms. Furthermore, the proposed initialization mechanism is based on an information retrieval algorithm called Rocchio Algorithm (RA); that identifies the center region of the search space of data classification. To validate the proposed mechanism, RA-based PSO has been applied to a high dimensional classification task in educational data mining. More specifically, RA-based PSO has been applied to classify a dataset of teachers' classroom questions into Bloom's taxonomy cognitive levels. To do so, a dataset of teachers' classroom questions has been collected and annotated manually with Bloom's taxonomy cognitive levels. Pre-processing steps have been applied to convert questions into a representation suitable for classification. Using this dataset, the standard PSO, PSO with generic initialization mechanisms, and RA-based PSO have been experimented and compared. The results show a poor performance of the standard PSO and the PSO with the generic initialization mechanisms, as well as a significant improvement in the performance of RA-based PSO. These results indicate that a proper task-specific PSO initialization mechanism is crucial for effective PSO performance in high dimensional data classification. Furthermore, a comparison between RA-based PSO and pure RA classification provide a quantitative estimation of the role of initialization mechanism and PSO search for the classification of the dataset. On the other hand, the comparison between RA-based PSO approach and three conventional machine learning approaches, experimented on the same dataset confirms the effectiveness of RA-based PSO for high dimensional data classification. Moreover, the comparison between RA-based PSO approach and machine learning approaches, in terms of computational time efficiency, shows that they are comparable in classification time. However, as the learning of PSO is a time-consuming process, its effectiveness is significantly affected if the learning time is a matter.
- Subjects :
- Rocchio algorithm
Clustering high-dimensional data
General Computer Science
Computer science
business.industry
General Mathematics
Data classification
MathematicsofComputing_NUMERICALANALYSIS
Evolutionary algorithm
Initialization
Particle swarm optimization
020206 networking & telecommunications
02 engineering and technology
computer.software_genre
Machine learning
ComputingMethodologies_ARTIFICIALINTELLIGENCE
Educational data mining
ComputingMethodologies_PATTERNRECOGNITION
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Data mining
Artificial intelligence
business
computer
Curse of dimensionality
Subjects
Details
- ISSN :
- 22106502
- Volume :
- 34
- Database :
- OpenAIRE
- Journal :
- Swarm and Evolutionary Computation
- Accession number :
- edsair.doi...........179c075ebc9bbc8ac3f971ade8055291
- Full Text :
- https://doi.org/10.1016/j.swevo.2016.11.005