Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation

X Qi, J Pan, P Li, R Yuan, X Chi, M Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Generating vivid and emotional 3D co-speech gestures is crucial for virtual avatar animation
in human-machine interaction applications. While the existing methods enable generating …

Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis

Z Zhang, T Ao, Y Zhang, Q Gao, C Lin… - ACM Transactions on …, 2024 - dl.acm.org
In this work, we present Semantic Gesticulator, a novel framework designed to synthesize
realistic gestures accompanying speech with strong semantic correspondence. Semantically …

Mambatalk: Efficient holistic gesture synthesis with selective state space models

Z Xu, Y Lin, H Han, S Yang, R Li, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Gesture synthesis is a vital realm of human-computer interaction, with wide-ranging
applications across various fields like film, robotics, and virtual reality. Recent advancements …

Towards Variable and Coordinated Holistic Co-Speech Motion Generation

Y Liu, Q Cao, Y Wen, H Jiang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
This paper addresses the problem of generating lifelike holistic co-speech motions for 3D
avatars focusing on two key aspects: variability and coordination. Variability allows the …

EGGesture: Entropy-Guided Vector Quantized Variational AutoEncoder for Co-Speech Gesture Generation

Y Xiao, K Shu, H Zhang, B Yin, WS Cheang… - Proceedings of the …, 2024 - dl.acm.org
Co-Speech gesture generation encounters challenges with imbalanced, long-tailed gesture
distributions. While recent methods typically address this by employing Vector Quantized …

ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE

S Wu, KI Haque, Z Yumak - Proceedings of the 17th ACM SIGGRAPH …, 2024 - dl.acm.org
Audio-driven 3D facial animation synthesis has been an active field of research with
attention from both academia and industry. While there are promising results in this area …

RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

J Chen, X Yan, Y Chen, S Cen, Q Ma, H Zhen… - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we introduce a challenging task for simultaneously generating 3D holistic body
motions and singing vocals directly from textual lyrics inputs, advancing beyond existing …

VCoME: Verbal Video Composition with Multimodal Editing Effects

W Gong, X Jin, X Li, D He, X Wu - arXiv preprint arXiv:2407.04697, 2024 - arxiv.org
Verbal videos, featuring voice-overs or text overlays, provide valuable content but present
significant challenges in composition, especially when incorporating editing effects to …

REALISTIC-GESTURE: CO-SPEECH GESTURE VIDEO GENERATION THROUGH CONTEXT-AWARE GESTURE REPRESENTATION

LS Generation - openreview.net
Co-speech gesture generation is crucial for creating lifelike avatars and enhancing human-
computer interactions by synchronizing gestures with speech in computer vision. Despite …