Back to Search
Start Over
GS4: Generating Synthetic Samples for Semi-Supervised Nearest Neighbor Classification
- Source :
- Lecture Notes in Computer Science ISBN: 9783319131856, PAKDD Workshops
- Publication Year :
- 2014
- Publisher :
- Springer International Publishing, 2014.
-
Abstract
- In this paper, we propose a method to improve nearest neighbor classification accuracy under a semi-supervised setting. We call our approach GS4 (i.e., Generating Synthetic Samples Semi-Supervised). Existing self-training approaches classify unlabeled samples by exploiting local information. These samples are then incorporated into the training set of labeled data. However, errors are propagated and misclassifications at an early stage severely degrade the classification accuracy. To address this problem, the proposed method exploits the unlabeled data by using weights proportional to the classification confidence to generate synthetic samples. Specifically, our scheme is inspired by the Synthetic Minority Over-Sampling Technique. That is, each unlabeled sample is used to generate as many labeled samples as the number of classes represented by its \(k\)-nearest neighbors. In particular, the distance of each synthetic sample from its \(k\)-nearest neighbors of the same class is proportional to the classification confidence. As a result, the robustness to misclassification errors is increased and better accuracy is achieved. Experimental results using publicly available datasets demonstrate that statistically significant improvements are obtained when the proposed approach is employed.
Details
- ISBN :
- 978-3-319-13185-6
- ISBNs :
- 9783319131856
- Database :
- OpenAIRE
- Journal :
- Lecture Notes in Computer Science ISBN: 9783319131856, PAKDD Workshops
- Accession number :
- edsair.doi...........0553b20a3705b6783a8636bcbafd71fa
- Full Text :
- https://doi.org/10.1007/978-3-319-13186-3_36