A Comprehensive Review of Data‐Driven Co‐Speech Gesture Generation

S Nyatsanga, T Kucherenko, C Ahuja… - Computer Graphics …, 2023 - Wiley Online Library
Gestures that accompany speech are an essential part of natural and efficient embodied
human communication. The automatic generation of such co‐speech gestures is a long …

Deep person generation: A survey from the perspective of face, pose, and cloth synthesis

T Sha, W Zhang, T Shen, Z Li, T Mei - ACM Computing Surveys, 2023 - dl.acm.org
Deep person generation has attracted extensive research attention due to its wide
applications in virtual agents, video conferencing, online shopping, and art/movie …

Beat: A large-scale semantic and emotional multi-modal dataset for conversational gestures synthesis

H Liu, Z Zhu, N Iwamoto, Y Peng, Z Li, Y Zhou… - European conference on …, 2022 - Springer
Achieving realistic, vivid, and human-like synthesized conversational gestures conditioned
on multi-modal data is still an unsolved problem due to the lack of available datasets …

Qpgesture: Quantization-based and phase-guided motion matching for natural speech-driven gesture generation

S Yang, Z Wu, M Li, Z Zhang, L Hao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Speech-driven gesture generation is highly challenging due to the random jitters of human
motion. In addition, there is an inherent asynchronous relationship between human speech …

Extrovert or Introvert? GAN-Based Humanoid Upper-Body Gesture Generation for Different Impressions

B Wu, C Liu, CT Ishi, J Shi, H Ishiguro - International Journal of Social …, 2023 - Springer
Gestures, a form of body language, significantly influence how users perceive humanoid
robots. Recent data-driven methods for co-speech gestures have successfully enhanced the …

Dr2: Disentangled recurrent representation learning for data-efficient speech video synthesis

C Zhang, C Wang, Y Zhao, S Cheng… - Proceedings of the …, 2024 - openaccess.thecvf.com
Although substantial progress has been made in audio-driven talking video synthesis, there
still remain two major difficulties: existing works 1) need a long sequence of training dataset …

Multimodal attention for lip synthesis using conditional generative adversarial networks

A Vidal, C Busso - Speech Communication, 2023 - Elsevier
The synthesis of lip movements is an important problem for a socially interactive agent (SIA).
It is important to generate lip movements that are synchronized with speech and have …

Close encounters with the virtual kind: Defining a human-virtual agent coexistence framework

J Arsenyan, A Mirowska, A Piepenbrink - Technological Forecasting and …, 2023 - Elsevier
Virtual agent research has evolved into a substantial body of work, albeit one with a
fragmented structure and overlapping, and at times inconsistent, definitions and results. The …

Large motion model for unified multi-modal motion generation

M Zhang, D Jin, C Gu, F Hong, Z Cai, J Huang… - arXiv preprint arXiv …, 2024 - arxiv.org
Human motion generation, a cornerstone technique in animation and video production, has
widespread applications in various tasks like text-to-motion and music-to-dance. Previous …

The design and observed effects of robot-performed manual gestures: A systematic review

J De Wit, P Vogt, E Krahmer - ACM Transactions on Human-Robot …, 2023 - dl.acm.org
Communication using manual (hand) gestures is considered a defining property of social
robots, and their physical embodiment and presence, therefore, we see a need for a …