Back to Search Start Over

Let us know your decision: Pool-based active training of a generative classifier with the selection strategy 4DS

Authors :
Reitmaier, Tobias
Sick, Bernhard
Source :
Information Sciences. May2013, Vol. 230, p106-131. 26p.
Publication Year :
2013

Abstract

Abstract: In this article, we introduce and investigate 4DS, a new selection strategy for pool-based active training of a generative classifier, namely CMM (classifier based on a probabilistic mixture model). Such a generative classifier aims at modeling the processes underlying the “generation” of the data. 4DS considers the distance of samples (observations) to the decision boundary, the density in regions, where samples are selected, the diversity of samples in the query set that are chosen for labeling, and, indirectly, the unknown class distribution of the samples by utilizing the responsibilities of the model components for these samples. The combination of the four measures in 4DS is self-optimizing in the sense that the weights of the distance, density, and class distribution measures depend on the currently estimated performance of the classifier. With 17 benchmark data sets it is shown that 4DS outperforms a random selection strategy (baseline method), a pure closest sampling approach, ITDS (information theoretic diversity sampling), DWUS (density-weighted uncertainty sampling), DUAL (dual strategy for active learning), PBAC (prototype based active learning), and 3DS (a technique we proposed earlier that does not consider responsibility information) regarding various evaluation criteria such as ranked performance based on classification accuracy, number of labeled samples (data utilization), and learning speed assessed by the area under the learning curve. It is also shown that—due to the use of responsibility information—4DS solves a key problem of active learning: The class distribution of the samples chosen for labeling actually approximates the unknown “true” class distribution of the overall data set quite well. With this article, we also pave the way for advanced selection strategies for an active training of discriminative classifiers such as support vector machines or decision trees: We show that responsibility information derived from generative models can successfully be employed to improve the training of those classifiers. [Copyright &y& Elsevier]

Details

Language :
English
ISSN :
00200255
Volume :
230
Database :
Academic Search Index
Journal :
Information Sciences
Publication Type :
Periodical
Accession number :
85615603
Full Text :
https://doi.org/10.1016/j.ins.2012.11.015