Start Over

SingleS2R: Single sample driven Sim-to-Real transfer for Multi-Source Visual-Tactile Information Understanding using multi-scale vision transformers.

Authors :: Tang, Jing
Gong, Zeyu
Tao, Bo
Yin, Zhouping
Source :: Information Fusion. Aug2024, Vol. 108, pN.PAG-N.PAG. 1p.
Publication Year :: 2024
Abstract: Due to variations in light transmission and wear on the contact head, existing visual-tactile dataset building methods typically require a large amount of real-world data, making the dataset building process time-consuming and labor-intensive. Sim-to-Real learning has been proposed to realize Multi-Source Visual-Tactile Information Understanding (MSVTIU) in simulate and real environment, which can efficiently promote visual-tactile dataset building using simulation method for emerged robotic applications. However, the existing Sim-to-Real learning also requires more than 10,000 real data, while the corresponding data need to be re-collected when the sensor version changes. To address this challenge, we propose a powerful Sim-to-Real transfer for MSVTIU which requires only one single real-world tactile sample. To effectively extract features from the single real tactile sample, a multi-scale vision transformers-based Generative Adversarial Network (GAN) is proposed to address MSVTIU task under extremely limited data. We introduce a novel scale-dependent self-attention mechanism that allows attention layers to adapt their behavior at different stages of the generating process. In addition, we introduced a residual block for capturing contextual information between adjacent scales, which utilizes shortcut connections to fully preserve texture and structure information. We subsequently enhanced the model understanding of visual-tactile information using elastic transform and adaptive adversarial training strategy, both of which are designed specifically for MSVTIU. Experiments on two public datasets with diverse objects indicate that our Sim-to-Real transfer approach utilizing only one single real-world visual-tactile sample outperforms the state-of-the-art methods that requires tens of thousands of samples. • Single sample-driven and new state-of-the-art method. • Our method outperforms the others requiring 10000+ samples. • A scale-dependent self-attention mechanism to extract features efficiently. • An adaptive residual block to better capturing contextual features. [ABSTRACT FROM AUTHOR]

Subjects :: *TRANSFORMER models
*GENERATIVE adversarial networks
*LIGHT transmission
*DEEP learning

Details

Language :: English
ISSN :: 15662535
Volume :: 108
Database :: Academic Search Index
Journal :: Information Fusion
Publication Type :: Academic Journal
Accession number :: 176811350
Full Text :: https://doi.org/10.1016/j.inffus.2024.102390

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

SingleS2R: Single sample driven Sim-to-Real transfer for Multi-Source Visual-Tactile Information Understanding using multi-scale vision transformers.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

SingleS2R: Single sample driven Sim-to-Real transfer for Multi-Source Visual-Tactile Information Understanding using multi-scale vision transformers.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources