Back to Search
Start Over
Analyzing the Potential of Zero-Shot Recognition for Document Image Classification
- Source :
- Document Analysis and Recognition – ICDAR 2021 ISBN: 9783030863364, ICDAR (4)
- Publication Year :
- 2021
- Publisher :
- Springer International Publishing, 2021.
-
Abstract
- Document image classification is one of the most important components in business automation workflow. Therefore, a range of different supervised image classification methods have been proposed, which rely on a large amount of labeled data, which is rarely available in practice. Furthermore, retraining of these models is necessary upon the introduction of new classes. In this paper, we analyze the potential of zero-shot document image classification based on computing the agreement between the images and the textual embeddings of class names/descriptions. This enables the deployment of document image classification models without the availability of any training data at zero training cost, alongside providing seamless integration of new classes. Our results show that using zero-shot recognition achieves significantly better than chance performance on document image classification benchmarks (49.51% accuracy on Tobacco-3482 in contrast to 10% random classifier accuracy and 39.22% on RVL-CDIP dataset in contrast to 6.25% random classifier accuracy). We also show that the representation learned by a vision transformer using image-text pairs is competitive to CNNs by training a linear SVM on top of the pre-computed representations which achieves comparable performance to state-of-the-art convolutional networks (85.74% on Tobacco-3482 dataset in contrast to 71.67% from ImageNet pretrained ResNet-50). Even though the initial results look encouraging, there is still a large gap to cover for zero-shot recognition within the domain of document images in contrast to natural scene image classification, which achieves comparable performance to fully supervised baselines. Our preliminary findings pave the way for the deployment of zero-shot document image classification in production settings.
- Subjects :
- Contextual image classification
Computer science
business.industry
Deep learning
ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
Contrast (statistics)
Pattern recognition
Class (biology)
Range (mathematics)
Workflow
Classifier (linguistics)
Artificial intelligence
business
Feature learning
Subjects
Details
- ISBN :
- 978-3-030-86336-4
- ISBNs :
- 9783030863364
- Database :
- OpenAIRE
- Journal :
- Document Analysis and Recognition – ICDAR 2021 ISBN: 9783030863364, ICDAR (4)
- Accession number :
- edsair.doi...........bf21e859a6f4c4e13b9276f36b158e30
- Full Text :
- https://doi.org/10.1007/978-3-030-86337-1_20