Moonshot: Towards controllable video generation and editing with multimodal conditions

DJ Zhang, D Li, H Le, MZ Shou, C Xiong… - arXiv preprint arXiv …, 2024 - arxiv.org
Most existing video diffusion models (VDMs) are limited to mere text conditions. Thereby,
they are usually lacking in control over visual appearance and geometry structure of the …

Streamingt2v: Consistent, dynamic, and extendable long video generation from text

R Henschel, L Khachatryan, D Hayrapetyan… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-video diffusion models enable the generation of high-quality videos that follow text
instructions, making it easy to create diverse and individual content. However, existing …

Videodrafter: Content-consistent multi-scene video generation with llm

F Long, Z Qiu, T Yao, T Mei - arXiv preprint arXiv:2401.01256, 2024 - arxiv.org
The recent innovations and breakthroughs in diffusion models have significantly expanded
the possibilities of generating high-quality videos for the given prompts. Most existing works …

360dvd: Controllable panorama video generation with 360-degree video diffusion model

Q Wang, W Li, C Mou, X Cheng… - Proceedings of the …, 2024 - openaccess.thecvf.com
Panorama video recently attracts more interest in both study and application courtesy of its
immersive experience. Due to the expensive cost of capturing 360-degree panoramic videos …

Vdt: General-purpose video diffusion transformers via mask modeling

H Lu, G Yang, N Fei, Y Huo, Z Lu, P Luo… - arXiv preprint arXiv …, 2023 - arxiv.org
This work introduces Video Diffusion Transformer (VDT), which pioneers the use of
transformers in diffusion-based video generation. It features transformer blocks with …

Livephoto: Real image animation with text-guided motion control

X Chen, Z Liu, M Chen, Y Feng, Y Liu, Y Shen… - arXiv preprint arXiv …, 2023 - arxiv.org
Despite the recent progress in text-to-video generation, existing studies usually overlook the
issue that only spatial contents but not temporal motions in synthesized videos are under the …

Direct-a-video: Customized video generation with user-directed camera movement and object motion

S Yang, L Hou, H Huang, C Ma, P Wan… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent text-to-video diffusion models have achieved impressive progress. In practice, users
often desire the ability to control object motion and camera movement independently for …

Tc4d: Trajectory-conditioned text-to-4d generation

S Bahmani, X Liu, Y Wang, I Skorokhodov… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using
supervision from pre-trained text-to-video models. However, existing representations for …

Maskint: Video editing via interpolative non-autoregressive masked transformers

H Ma, S Mahdizadehaghdam, B Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advances in generative AI have significantly enhanced image and video editing
particularly in the context of text prompt control. State-of-the-art approaches predominantly …

Renaissance: A survey into ai text-to-image generation in the era of large model

F Bie, Y Yang, Z Zhou, A Ghanem, M Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Text-to-image generation (TTI) refers to the usage of models that could process text input
and generate high fidelity images based on text descriptions. Text-to-image generation …