Back to Search Start Over

Automatic Evaluation of Document Classification Using N-Gram Statistics.

Authors :
Choi, Dongjin
Ko, Byeongkyu
Lee, Eunji
Hwang, Myunggwon
Kim, Pankoo
Source :
2012 15th International Conference on Network-Based Information Systems; 1/ 1/2012, p739-742, 4p
Publication Year :
2012

Abstract

Due to the development of World Wide Web technologies, people are living in the place flooding trillions of web pages in every moment. The amount of web size has been increasing dramatically. For this reason, it is getting more difficult to find relevant web documents corresponding to what users want to read. Classifying documents into predefined categories is one of the most important tasks in Natural Language Processing field. Over the years, many statistical and linguistical approaches have been applied to overcome traditional classification machine. However, it still remains in unsolved problem. There is a no perfect solution to machine understand human language yet. We have to consider every possibility for making machine think like human does. In this paper, we propose a method for classifying textural document using n-gram co-occurrence statistics which have a great possibility to find similarities between given documents. We also compare our proposed method with traditional method suggested by Keselj. This paper only covers simple approaches and still needs more sophisticated experiments. However, the performance using this method is better than the Keselj approach. [ABSTRACT FROM PUBLISHER]

Details

Language :
English
ISBNs :
9781467323314
Database :
Complementary Index
Journal :
2012 15th International Conference on Network-Based Information Systems
Publication Type :
Conference
Accession number :
86494886
Full Text :
https://doi.org/10.1109/NBiS.2012.96