Back to Search Start Over

Text Classification without Negative Examples Revisit.

Authors :
Gabriel Pui Cheong Fung
Yu, Jeffrey X.
Hongjun Lu
Yu, Philip S.
Source :
IEEE Transactions on Knowledge & Data Engineering. Jan2006, Vol. 18 Issue 1, p6-20. 15p.
Publication Year :
2006

Abstract

Traditionally, building a classifier requires two sets of examples: positive examples and negative examples. This paper studies the problem of building a text classifier using positive examples (P) and unlabeled examples (U). The unlabeled examples are mixed with both positive and negative examples. Since no negative example is given explicitly, the task of building a reliable text classifier becomes far more challenging. Simply treating all of the unlabeled examples as negative examples and building a classifier thereafter is undoubtedly a poor approach to tackling this problem. Generally speaking, most of the studies solved this problem by a two-step heuristic: First, extract negative examples (N) from U. Second, build a classifier based on P and N. Surprisingly, most studies did not try to extract positive examples from U. Intuitively, enlarging P by P' (positive examples extracted from U) and building a classifier thereafter should enhance the effectiveness of the classifier. Throughout our study, we find that extracting P' is very difficult. A document in U that possesses the features exhibited in P does not necessarily mean that it is a positive example, and vice versa. The very large size of and very high diversity in U also contribute to the difficulties of extracting P'. In this paper, we propose a labeling heuristic called PNLH to tackle this problem. PNLH aims at extracting high quality positive examples and negative examples from U and can be used on top of any existing classifiers. Extensive experiments based on several benchmarks are conducted. The results indicated that PNLH is highly feasible, especially in the situation where ∣P∣ is extremely small. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10414347
Volume :
18
Issue :
1
Database :
Academic Search Index
Journal :
IEEE Transactions on Knowledge & Data Engineering
Publication Type :
Academic Journal
Accession number :
19253604
Full Text :
https://doi.org/10.1109/TKDE.2006.16