Start Over

Audio2DiffuGesture: Generating a diverse co-speech gesture based on a diffusion model.

Authors :: Yao, Hongze
Xu, Yingting
WU, Weitao
He, Huabin
Ren, Wen
Cai, Zhiming
Source :: Electronic Research Archive; Sep2024, Vol. 32 Issue 9, p1-17, 17p
Publication Year :: 2024
Abstract: People use a combination of language and gestures to convey intentions, making the generation of natural co-speech gestures a challenging task. In audio-driven gesture generation, relying solely on features extracted from raw audio waveforms limits the model's ability to fully learn the joint distribution between audio and gestures. To address this limitation, we integrated key features from both raw audio waveforms and Mel-spectrograms. Specifically, we employed cascaded 1D convolutions to extract features from the audio waveform and a two-stage attention mechanism to capture features from the Mel-spectrogram. The fused features were then input into a Transformer with cross-dimension attention for sequence modeling, which mitigated accumulated non-autoregressive errors and reduced redundant information. We developed a diffusion model-based Audio to Diffusion Gesture (A2DG) generation pipeline capable of producing high-quality and diverse gestures. Our method demonstrated superior performance in extensive experiments compared to established baselines. Regarding the TED Gesture and TED Expressive datasets, the Fréchet Gesture Distance (FGD) performance improved by 16.8 and 56%, respectively. Additionally, a user study validated that the co-speech gestures generated by our method are more vivid and realistic. [ABSTRACT FROM AUTHOR]

Subjects :: GESTURE
DIFFUSION
SPECTROGRAMS
SOUND spectrography
HUMAN-computer interaction

Details

Language :: English
ISSN :: 26881594
Volume :: 32
Issue :: 9
Database :: Complementary Index
Journal :: Electronic Research Archive
Publication Type :: Academic Journal
Accession number :: 180175596
Full Text :: https://doi.org/10.3934/era.2024250

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Audio2DiffuGesture: Generating a diverse co-speech gesture based on a diffusion model.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Audio2DiffuGesture: Generating a diverse co-speech gesture based on a diffusion model.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources