Back to Search
Start Over
Ensemble multi-label text categorization based on rotation forest and latent semantic indexing
- Source :
- Expert Systems with Applications, Expert Systems with Applications, Elsevier, 2016, 57, pp.1-11
- Publication Year :
- 2016
- Publisher :
- Elsevier BV, 2016.
-
Abstract
- Text categorization has gained increasing popularity in the last years due the explosive growth of multimedia documents. As a document can be associated with multiple non-exclusive categories simultaneously (e.g., Virus, Health, Sports, and Olympic Games), text categorization provides many opportunities for developing novel multi-label learning approaches devoted specifically to textual data. In this paper, we propose an ensemble multi-label classification method for text categorization based on four key ideas: (1) performing Latent Semantic Indexing based on distinct orthogonal projections on lower-dimensional spaces of concepts; (2) random splitting of the vocabulary; (3) document bootstrapping; and (4) the use of BoosTexter as a powerful multi-label base learner for text categorization to simultaneously encourage diversity and individual accuracy in the committee. Diversity of the ensemble is promoted through random splits of the vocabulary that leads to different orthogonal projections on lower-dimensional latent concept spaces. Accuracy of the committee members is promoted through the underlying latent semantic structure uncovered in the text. The combination of both rotation-based ensemble construction and Latent Semantic Indexing projection is shown to bring about significant improvements in terms of Average Precision, Coverage, Ranking loss and One error compared to five state-of-the-art approaches across 14 real-word textual data sets covering a wide variety of topics including health, education, business, science and arts.
- Subjects :
- [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI]
Vocabulary
Computer science
media_common.quotation_subject
02 engineering and technology
computer.software_genre
Machine learning
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Ranking (information retrieval)
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
Artificial Intelligence
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
ComputingMilieux_MISCELLANEOUS
media_common
Multi-label classification
Structure (mathematical logic)
Probabilistic latent semantic analysis
business.industry
4. Education
General Engineering
Bootstrapping (linguistics)
[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG]
Ensemble learning
Computer Science Applications
Projection (relational algebra)
Ranking
020201 artificial intelligence & image processing
Artificial intelligence
business
computer
Natural language processing
Latent semantic indexing
Subjects
Details
- ISSN :
- 09574174
- Volume :
- 57
- Database :
- OpenAIRE
- Journal :
- Expert Systems with Applications
- Accession number :
- edsair.doi.dedup.....b1143760295956ca13ce3e63e2d2763c
- Full Text :
- https://doi.org/10.1016/j.eswa.2016.03.041