Learning hierarchical cross-modal association for co-speech gesture generation

W Zhu, X Ma, D Ro, H Ci, J Zhang, J Shi… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Human motion generation aims to generate natural human pose sequences and shows
immense potential for real-world applications. Substantial progress has been made recently …

被引用次数：56 相关文章所有 8 个版本

[PDF] thecvf.com

Humangaussian: Text-driven 3d human generation with gaussian splatting

X Liu, X Zhan, J Tang, Y Shan, G Zeng… - Proceedings of the …, 2024 - openaccess.thecvf.com

Realistic 3D human generation from text prompts is a desirable yet challenging task.
Existing methods optimize 3D representations like mesh or neural fields via score distillation …

被引用次数：69 相关文章所有 3 个版本

[PDF] thecvf.com

Taming diffusion models for audio-driven co-speech gesture generation

L Zhu, X Liu, X Liu, R Qian, Z Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Animating virtual avatars to make co-speech gestures facilitates various applications in
human-machine interaction. The existing methods mainly rely on generative adversarial …

被引用次数：99 相关文章所有 7 个版本

[PDF] arxiv.org

Gesturediffuclip: Gesture diffusion model with clip latents

T Ao, Z Zhang, L Liu - ACM Transactions on Graphics (TOG), 2023 - dl.acm.org

The automatic generation of stylized co-speech gestures has recently received increasing
attention. Previous systems typically allow style control via predefined text labels or example …

被引用次数：129 相关文章所有 3 个版本

[PDF] thecvf.com

Generating holistic 3d human motion from speech

H Yi, H Liang, Y Liu, Q Cao, Y Wen… - Proceedings of the …, 2023 - openaccess.thecvf.com

This work addresses the problem of generating 3D holistic body motions from human
speech. Given a speech recording, we synthesize sequences of 3D body poses, hand …

被引用次数：127 相关文章所有 7 个版本

[PDF] thecvf.com

Expressive talking head generation with granular audio-visual control

B Liang, Y Pan, Z Guo, H Zhou… - Proceedings of the …, 2022 - openaccess.thecvf.com

Generating expressive talking heads is essential for creating virtual humans. However,
existing one-or few-shot methods focus on lip-sync and head motion, ignoring the emotional …

被引用次数：134 相关文章所有 4 个版本

[PDF] arxiv.org

Rhythmic gesticulator: Rhythm-aware co-speech gesture synthesis with hierarchical neural embeddings

T Ao, Q Gao, Y Lou, B Chen, L Liu - ACM Transactions on Graphics …, 2022 - dl.acm.org

Automatic synthesis of realistic co-speech gestures is an increasingly important yet
challenging task in artificial embodied agent creation. Previous systems mainly focus on …

被引用次数：108 相关文章所有 3 个版本

[PDF] arxiv.org

Semantic-aware implicit neural audio-driven video portrait generation

X Liu, Y Xu, Q Wu, H Zhou, W Wu, B Zhou - European conference on …, 2022 - Springer

Animating high-fidelity video portrait with speech audio is crucial for virtual reality and digital
entertainment. While most previous studies rely on accurate explicit structural information …

被引用次数：140 相关文章所有 5 个版本

[PDF] thecvf.com

Livelyspeaker: Towards semantic-aware co-speech gesture generation

Y Zhi, X Cun, X Chen, X Shen, W Guo… - Proceedings of the …, 2023 - openaccess.thecvf.com

Gestures are non-verbal but important behaviors accompanying people's speech. While
previous methods are able to generate speech rhythm-synchronized gestures, the semantic …

被引用次数：18 相关文章所有 6 个版本

[PDF] arxiv.org

Large motion model for unified multi-modal motion generation

M Zhang, D Jin, C Gu, F Hong, Z Cai, J Huang… - … on Computer Vision, 2025 - Springer

Human motion generation, a cornerstone technique in animation and video production, has
widespread applications in various tasks like text-to-motion and music-to-dance. Previous …

被引用次数：13 相关文章所有 2 个版本

高级搜索

QQ 群