The GENEA Challenge 2023: A large-scale evaluation of gesture generation models in monadic and dyadic settings

T Kucherenko, R Nagy, Y Yoon, J Woo… - Proceedings of the 25th …, 2023 - dl.acm.org
This paper reports on the GENEA Challenge 2023, in which participating teams built speech-
driven gesture-generation systems using the same speech and motion dataset, followed by …

Chain of generation: Multi-modal gesture synthesis via cascaded conditional control

Z Xu, Y Zhang, S Yang, R Li, X Li - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
This study aims to improve the generation of 3D gestures by utilizing multimodal information
from human speech. Previous studies have focused on incorporating additional modalities …

Mambatalk: Efficient holistic gesture synthesis with selective state space models

Z Xu, Y Lin, H Han, S Yang, R Li, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Gesture synthesis is a vital realm of human-computer interaction, with wide-ranging
applications across various fields like film, robotics, and virtual reality. Recent advancements …

Freetalker: Controllable speech and text-driven gesture generation based on diffusion models for enhanced speaker naturalness

S Yang, Z Xu, H Xue, Y Cheng, S Huang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Current talking avatars mostly generate co-speech gestures based on audio and text of the
utterance, without considering the non-speaking motion of the speaker. Furthermore …

MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion

C Fu, Y Wang, J Zhang, Z Jiang, X Mao, J Wu… - Proceedings of the …, 2024 - dl.acm.org
Co-speech gesture generation is crucial for producing synchronized and realistic human
gestures that accompany speech, enhancing the animation of lifelike avatars in virtual …

Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis

S Mehta, A Deichler, J O'regan… - Proceedings of the …, 2024 - openaccess.thecvf.com
Although humans engaged in face-to-face conversation simultaneously communicate both
verbally and non-verbally methods for joint and unified synthesis of speech audio and co …

Mdt-a2g: Exploring masked diffusion transformers for co-speech gesture generation

X Mao, Z Jiang, Q Wang, C Fu, J Zhang, J Wu… - Proceedings of the …, 2024 - dl.acm.org
Recent advancements in the field of Diffusion Transformers have substantially improved the
generation of high-quality 2D images, 3D videos, and 3D shapes. However, the …

Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models

H Xue, S Yang, Z Zhang, Z Wu, M Li… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Audio-driven co-speech human gesture generation has made remarkable advancements
recently. However, most previous works only focus on single person audio-driven gesture …

Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference

F Zhang, Z Wang, X Lyu, S Zhao, M Li… - … on Visualization and …, 2024 - ieeexplore.ieee.org
Speech-driven gesture generation is an emerging field within virtual human creation.
However, a significant challenge lies in accurately determining and processing the multitude …

DiT-Gesture: A Speech-Only Approach to Stylized Gesture Generation

F Zhang, Z Wang, X Lyu, N Ji, S Zhao, F Gao - Electronics, 2024 - mdpi.com
The generation of co-speech gestures for digital humans is an emerging area in the field of
virtual human creation. Prior research has progressed by using acoustic and semantic …