1. Document classification system based on HMM word map
- Author
-
Tsimboukakis Nikolaos and Tambouratzis George
- Subjects
business.industry ,Computer science ,Document classification ,Feature extraction ,Modern Greek ,Linear classifier ,Pattern recognition ,computer.software_genre ,ComputingMethodologies_PATTERNRECOGNITION ,Connectionism ,Multilayer perceptron ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Artificial intelligence ,business ,Hidden Markov model ,Classifier (UML) ,computer - Abstract
In this article, a system based on Hidden Markov Models (HMM) for document organization is presented. The purpose of the system is the classification of a document collection in terms of document content. The system possesses a two-level hybrid connectionist architecture that comprises (i) an automatically created word map using a HMM, which functions as a feature extraction module and (ii) a supervised MLP-based classifier, which provides the final classification result. A series of experiments, which have been performed on Modern Greek text-only documents, is presented. These experiments illustrate the effectiveness of the proposed system.
- Published
- 2008
- Full Text
- View/download PDF