Echomimic: Lifelike audio-driven portrait animations through editable landmark conditions

Z Chen, J Cao, Z Chen, Y Li, C Ma - arXiv preprint arXiv:2407.08136, 2024 - arxiv.org
The area of portrait image animation, propelled by audio input, has witnessed notable
progress in the generation of lifelike and dynamic portraits. Conventional methods are …

Chronomagic-bench: A benchmark for metamorphic evaluation of text-to-time-lapse video generation

S Yuan, J Huang, Y Xu, Y Liu, S Zhang, Y Shi… - arXiv preprint arXiv …, 2024 - arxiv.org
We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to
evaluate the temporal and metamorphic capabilities of the T2V models (eg Sora and …

Dit4edit: Diffusion transformer for image editing

K Feng, Y Ma, B Wang, C Qi, H Chen, Q Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite recent advances in UNet-based image editing, methods for shape-aware object
editing in high-resolution images are still lacking. Compared to UNet, Diffusion Transformers …

Instantswap: Fast customized concept swapping across sharp shape differences

C Zhu, K Li, Y Ma, L Tang, C Fang, C Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advances in Customized Concept Swapping (CCS) enable a text-to-image model to
swap a concept in the source image with a customized target concept. However, the existing …

MegActor-: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer

S Yang, H Li, J Wu, M Jing, L Li, R Ji, J Liang… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion models have demonstrated superior performance in the field of portrait animation.
However, current approaches relied on either visual or audio modality to control character …

INFP: Audio-driven interactive head generation in dyadic conversations

Y Zhu, L Zhang, Z Rong, T Hu, S Liang, Z Ge - arXiv preprint arXiv …, 2024 - arxiv.org
Imagine having a conversation with a socially intelligent agent. It can attentively listen to
your words and offer visual and linguistic feedback promptly. This seamless interaction …

Noise Calibration: Plug-and-Play Content-Preserving Video Enhancement Using Pre-trained Video Diffusion Models

Q Yang, H Chen, Y Zhang, M Xia, X Cun, Z Su… - … on Computer Vision, 2024 - Springer
In order to improve the quality of synthesized videos, currently, one predominant method
involves retraining an expert diffusion model and then implementing a noising-denoising …

Refir: Grounding large restoration models with retrieval augmentation

H Guo, T Dai, Z Ouyang, T Zhang, Y Zha… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advances in diffusion-based Large Restoration Models (LRMs) have significantly
improved photo-realistic image restoration by leveraging the internal knowledge embedded …

Follow-your-canvas: Higher-resolution video outpainting with extensive content generation

Q Chen, Y Ma, H Wang, J Yuan, W Zhao, Q Tian… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper explores higher-resolution video outpainting with extensive content generation.
We point out common issues faced by existing methods when attempting to largely outpaint …

Human motion video generation: A survey

H Xue, X Luo, Z Hu, X Zhang, X Xiang, Y Dai, J Liu… - Authorea …, 2024 - techrxiv.org
Human motion video generation has garnered significant research interest due to its broad
applications, enabling innovations such as photorealistic singing heads or dynamic avatars …