Start Over

3D head-talk: speech synthesis 3D head movement face animation.

Authors :: Yang, Daowu
Li, Ruihui
Yang, Qi
Peng, Yuyi
Huang, Xibei
Zou, Jing
Source :: Soft Computing - A Fusion of Foundations, Methodologies & Applications. Jan2024, Vol. 28 Issue 1, p363-379. 17p.
Publication Year :: 2024
Abstract: Speech-driven 3D human face animation has made admirable progress. However, synthesizing 3D facial speakers with head motion is still an unsolved problem. This is because head motion, as a speech-independent appearance representation, is difficult to model by a speech-driven approach. To solve this problem, we propose 3D head-talk, which generates 3D face animations combined with extreme head motion. In this work, we face a key challenge to generate natural head movements that match the speech rhythm. We first form an end-to-end autoregressive model by combining a dual-tower and single-tower Transformer, with a speech encoder encoding the long-term audio environment, a facial grid encoder encoding subtle changes in the vertices of the 3D facial grid, and a single-tower decoder automatically regressing to predict a series of 3D facial animation grids. Next, the predicted 3D facial animation sequence is edited by a motion field generator containing head motion to obtain an output sequence containing extreme head motion. Finally, the natural 3D face animation under extreme head motion is presented in combination with the input audio. The quantitative and qualitative results show that our method outperforms current state-of-the-art methods, and stabilizes the non-area region while maintaining the appearance of extreme head motion. [ABSTRACT FROM AUTHOR]

Subjects :: *SPEECH synthesis
*3-D animation
*AUTOREGRESSIVE models
*TRANSFORMER models
*ENTORHINAL cortex
*HEAD

Details

Language :: English
ISSN :: 14327643
Volume :: 28
Issue :: 1
Database :: Academic Search Index
Journal :: Soft Computing - A Fusion of Foundations, Methodologies & Applications
Publication Type :: Academic Journal
Accession number :: 174601082
Full Text :: https://doi.org/10.1007/s00500-023-09292-5