Celebv-text: A large-scale facial text-video dataset

Z Xing, Q Feng, H Chen, Q Dai, H Hu, H Xu… - arXiv preprint arXiv …, 2023 - arxiv.org

The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

被引用次数：36 相关文章所有 3 个版本

[PDF] neurips.cc

Autodecoding latent 3d diffusion models

E Ntavelis, A Siarohin, K Olszewski… - Advances in …, 2023 - proceedings.neurips.cc

Diffusion-based methods have shown impressive visual results in the text-to-image domain.
They first learn a latent space using an autoencoder, then run a denoising process on the …

被引用次数：21 相关文章所有 5 个版本

[PDF] thecvf.com

Portraitbooth: A versatile portrait model for fast identity-preserved personalization

X Peng, J Zhu, B Jiang, Y Tai, D Luo… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent advancements in personalized image generation using diffusion models have been
noteworthy. However existing methods suffer from inefficiencies due to the requirement for …

被引用次数：10 相关文章所有 3 个版本

[PDF] arxiv.org

MedShapeNet--A large-scale dataset of 3D medical shapes for computer vision

J Li, A Pepe, C Gsaxner, G Luijten, Y Jin… - arXiv preprint arXiv …, 2023 - arxiv.org

We present MedShapeNet, a large collection of anatomical shapes (eg, bones, organs,
vessels) and 3D surgical instrument models. Prior to the deep learning era, the broad …

被引用次数：15 相关文章所有 5 个版本

[PDF] thecvf.com

Laughtalk: Expressive 3d talking head generation with laughter

K Sung-Bin, L Hyun, DH Hong… - Proceedings of the …, 2024 - openaccess.thecvf.com

Laughter is a unique expression, essential to affirmative social interactions of humans.
Although current 3D talking head generation methods produce convincing verbal …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Ft2tf: First-person statement text-to-talking face generation

X Diao, M Cheng, W Barrios, SY Jin - arXiv preprint arXiv:2312.05430, 2023 - arxiv.org

Talking face generation has gained immense popularity in the computer vision community,
with various applications including AR/VR, teleconferencing, digital assistants, and avatars …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Agentavatar: Disentangling planning, driving and rendering for photorealistic avatar agents

D Wang, B Dai, Y Deng, B Wang - arXiv preprint arXiv:2311.17465, 2023 - arxiv.org

In this study, our goal is to create interactive avatar agents that can autonomously plan and
animate nuanced facial movements realistically, from both visual and behavioral …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

LLMs Meet Multimodal Generation and Editing: A Survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

From Sora What We Can See: A Survey of Text-to-Video Generation

R Sun, Y Zhang, T Shah, J Sun, S Zhang, W Li… - arXiv preprint arXiv …, 2024 - arxiv.org

With impressive achievements made, artificial intelligence is on the path forward to artificial
general intelligence. Sora, developed by OpenAI, which is capable of minute-level world …

3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow

F Taubner, P Raina, M Tuli, EW Teh… - Proceedings of the …, 2024 - openaccess.thecvf.com

When working with 3D facial data improving fidelity and avoiding the uncanny valley effect is
critically dependent on accurate 3D facial performance capture. Because such methods are …

高级搜索

QQ 群