Video understanding with large language models: A survey

Y Tang, J Bi, S Xu, L Song, S Liang, T Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
With the burgeoning growth of online video platforms and the escalating volume of video
content, the demand for proficient video understanding tools has intensified markedly. Given …

Freemotion: A unified framework for number-free text-to-motion synthesis

K Fan, J Tang, W Cao, R Yi, M Li, J Gong… - … on Computer Vision, 2024 - Springer
Text-to-motion synthesis is a crucial task in computer vision. Existing methods are limited in
their universality, as they are tailored for single-person or two-person scenarios and can not …

DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation

J Kim, J Cho, J Park, S Hwang, DE Kim, G Kim… - arXiv preprint arXiv …, 2024 - arxiv.org
Speech-driven 3D facial animation has garnered lots of attention thanks to its broad range of
applications. Despite recent advancements in achieving realistic lip motion, current methods …

TalkingEyes: Pluralistic Speech-Driven 3D Eye Gaze Animation

Y Zhuang, C Ma, Y Cheng, X Cheng, J Liao… - arXiv preprint arXiv …, 2025 - arxiv.org
Although significant progress has been made in the field of speech-driven 3D facial
animation recently, the speech-driven animation of an indispensable facial component, eye …

MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls

Y Bian, A Zeng, X Ju, X Liu, Z Zhang, W Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Whole-body multimodal motion generation, controlled by text, speech, or music, has
numerous applications including video generation and character animation. However …

[HTML][HTML] Music-stylized hierarchical dance synthesis with user control

Y Cheng, Y Jiang, Y Wang - Virtual Reality & Intelligent Hardware, 2024 - Elsevier
Background Synthesizing dance motions to match musical inputs is a significant challenge
in animation research. Compared to functional human motions, such as locomotion, dance …

Diverse Code Query Learning for Speech-Driven Facial Animation

C Gu, S Kuriyama, K Hotta - arXiv preprint arXiv:2409.19143, 2024 - arxiv.org
Speech-driven facial animation aims to synthesize lip-synchronized 3D talking faces
following the given speech signal. Prior methods to this task mostly focus on pursuing …

Human-like Nonverbal Behavior with MetaHumans in Real-World Interaction Studies: An Architecture Using Generative Methods and Motion Capture

O Chojnowski, A Eberhard, M Schiffmann… - arXiv preprint arXiv …, 2025 - arxiv.org
Socially interactive agents are gaining prominence in domains like healthcare, education,
and service contexts, particularly virtual agents due to their inherent scalability. To facilitate …