Back to Search Start Over

Learning Hierarchical Visual Transformation for Domain Generalizable Visual Matching and Recognition.

Authors :
Yang, Xun
Chang, Tianyu
Zhang, Tianzhu
Wang, Shanshan
Hong, Richang
Wang, Meng
Source :
International Journal of Computer Vision. May2024, p1-27.
Publication Year :
2024

Abstract

Modern deep neural networks are prone to learn domain-dependent shortcuts and thus usually suffer from severe performance degradation when tested in unseen target domains due to their poor ability of out-of-distribution generalization, which significantly limits the real-world applications. The main reason is the <italic>domain shift</italic> lying in the large distribution gap between source and unseen target data. To this end, this paper takes a step towards training robust models for domain generalizable visual tasks, which mainly focuses on learning domain-invariant visual representation to alleviate the domain shift. Specifically, we first propose an effective Hierarchical Visual Transformation (HVT) network to (1) first transform the training sample hierarchically into new domains with diverse distributions from three levels: Global, Local, and Pixel, (2) then maximize the visual discrepancy between the source domain and new domains, and minimize the cross-domain feature inconsistency to capture domain-invariant features. Besides, we further enhance the HVT network by introducing the environment-invariant learning. To be specific, we enforce the invariance of the visual representation across automatically inferred environments by minimizing invariant learning loss that considers the weighted average of environmental losses. In this way, we can prevent the model from relying on the spurious features for prediction, thus helping the model to effectively learn domain-invariant representation and narrow the domain gap in various visual matching and recognition tasks, such as stereo matching, pedestrian retrieval, and image classification. We term our extended HVT as EHVT to show distinction. We integrate our EHVT network into different models and evaluate its effectiveness and compatibility on several public benchmark datasets. Extensive experiments clearly show that our EHVT can substantially enhance the generalization performance in various tasks. Our codes are available at https://github.com/cty8998/EHVT-VisualDG. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09205691
Database :
Academic Search Index
Journal :
International Journal of Computer Vision
Publication Type :
Academic Journal
Accession number :
177476157
Full Text :
https://doi.org/10.1007/s11263-024-02106-7