Videocomposer: Compositional video synthesis with motion controllability

DJ Zhang, D Li, H Le, MZ Shou, C Xiong… - arXiv preprint arXiv …, 2024 - arxiv.org

Most existing video diffusion models (VDMs) are limited to mere text conditions. Thereby,
they are usually lacking in control over visual appearance and geometry structure of the …

被引用次数：10 相关文章

[PDF] arxiv.org

Streamingt2v: Consistent, dynamic, and extendable long video generation from text

R Henschel, L Khachatryan, D Hayrapetyan… - arXiv preprint arXiv …, 2024 - arxiv.org

Text-to-video diffusion models enable the generation of high-quality videos that follow text
instructions, making it easy to create diverse and individual content. However, existing …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Videodrafter: Content-consistent multi-scene video generation with llm

F Long, Z Qiu, T Yao, T Mei - arXiv preprint arXiv:2401.01256, 2024 - arxiv.org

The recent innovations and breakthroughs in diffusion models have significantly expanded
the possibilities of generating high-quality videos for the given prompts. Most existing works …

被引用次数：9 相关文章所有 2 个版本

[PDF] thecvf.com

360dvd: Controllable panorama video generation with 360-degree video diffusion model

Q Wang, W Li, C Mou, X Cheng… - Proceedings of the …, 2024 - openaccess.thecvf.com

Panorama video recently attracts more interest in both study and application courtesy of its
immersive experience. Due to the expensive cost of capturing 360-degree panoramic videos …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Vdt: General-purpose video diffusion transformers via mask modeling

H Lu, G Yang, N Fei, Y Huo, Z Lu, P Luo… - arXiv preprint arXiv …, 2023 - arxiv.org

This work introduces Video Diffusion Transformer (VDT), which pioneers the use of
transformers in diffusion-based video generation. It features transformer blocks with …

被引用次数：15 相关文章所有 3 个版本

Livephoto: Real image animation with text-guided motion control

X Chen, Z Liu, M Chen, Y Feng, Y Liu, Y Shen… - arXiv preprint arXiv …, 2023 - arxiv.org

Despite the recent progress in text-to-video generation, existing studies usually overlook the
issue that only spatial contents but not temporal motions in synthesized videos are under the …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Direct-a-video: Customized video generation with user-directed camera movement and object motion

S Yang, L Hou, H Huang, C Ma, P Wan… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent text-to-video diffusion models have achieved impressive progress. In practice, users
often desire the ability to control object motion and camera movement independently for …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Tc4d: Trajectory-conditioned text-to-4d generation

S Bahmani, X Liu, Y Wang, I Skorokhodov… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using
supervision from pre-trained text-to-video models. However, existing representations for …

被引用次数：6 相关文章所有 3 个版本

[PDF] thecvf.com

Maskint: Video editing via interpolative non-autoregressive masked transformers

H Ma, S Mahdizadehaghdam, B Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent advances in generative AI have significantly enhanced image and video editing
particularly in the context of text prompt control. State-of-the-art approaches predominantly …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Renaissance: A survey into ai text-to-image generation in the era of large model

F Bie, Y Yang, Z Zhou, A Ghanem, M Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Text-to-image generation (TTI) refers to the usage of models that could process text input
and generate high fidelity images based on text descriptions. Text-to-image generation …

被引用次数：13 相关文章所有 2 个版本

高级搜索

QQ 群