1. AN EFFICIENT MODEL FOR ENHANCING TEXT CATEGORIZATION USING SENTENCE SEMANTICS
- Author
-
Shady Shehata, Mohamed S. Kamel, and Fakhri Karray
- Subjects
Information retrieval ,Phrase ,business.industry ,Computer science ,Meaning (non-linguistic) ,Term (logic) ,Semantics ,computer.software_genre ,Computational Mathematics ,Categorization ,Artificial Intelligence ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Formal concept analysis ,Artificial intelligence ,business ,Set (psychology) ,computer ,Natural language processing ,Sentence - Abstract
Most of text categorization techniques are based on word and/or phrase analysis of the text. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in there documents, but one term contributes more to the meaning of its sentences than the other term. Thus, the underlying model should identify terms that capture the semantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads to discovering the topic of the document. A new concept-based model that analyzes terms on the sentence, document, and corpus levels rather than the traditional analysis of document only is introduced. The concept-based model can effectively discriminate between nonimportant terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. A set of experiments using the proposed concept-based model on different datasets in text categorization is conducted in comparison with the traditional models. The results demonstrate the substantial enhancement of the categorization quality using the sentence-based, document-based and corpus-based concept analysis.
- Published
- 2010