Streamingt2v: Consistent, dynamic, and extendable long video generation from text

R Henschel, L Khachatryan, D Hayrapetyan… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-video diffusion models enable the generation of high-quality videos that follow text
instructions, making it easy to create diverse and individual content. However, existing …

From Sora What We Can See: A Survey of Text-to-Video Generation

R Sun, Y Zhang, T Shah, J Sun, S Zhang, W Li… - arXiv preprint arXiv …, 2024 - arxiv.org
With impressive achievements made, artificial intelligence is on the path forward to artificial
general intelligence. Sora, developed by OpenAI, which is capable of minute-level world …

AI-Generated Videos and Deepfakes: A Technical Primer

A Sufian - Authorea Preprints, 2024 - techrxiv.org
Artificial intelligence, specially deep learning (DL)-based computer vision algorithms has
been revolutionizing video generation, enabling the creation of realistic videos through …

Autoregressive Models in Vision: A Survey

J Xiong, G Liu, L Huang, C Wu, T Wu, Y Mu… - arXiv preprint arXiv …, 2024 - arxiv.org
Autoregressive modeling has been a huge success in the field of natural language
processing (NLP). Recently, autoregressive models have emerged as a significant area of …

VideoTetris: Towards Compositional Text-to-Video Generation

Y Tian, L Yang, H Yang, Y Gao, Y Deng, J Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion models have demonstrated great success in text-to-video (T2V) generation.
However, existing methods may face challenges when handling complex (long) video …

ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models

K Gao, J Shi, H Zhang, C Wang, J Xiao - arXiv preprint arXiv:2406.10981, 2024 - arxiv.org
With the advance of diffusion models, today's video generation has achieved impressive
quality. But generating temporal consistent long videos is still challenging. A majority of …

Vivid-ZOO: Multi-View Video Generation with Diffusion Model

B Li, C Zheng, W Zhu, J Mai, B Zhang, P Wonka… - arXiv preprint arXiv …, 2024 - arxiv.org
While diffusion models have shown impressive performance in 2D image/video generation,
diffusion-based Text-to-Multi-view-Video (T2MVid) generation remains underexplored. The …

Video-to-Audio Generation with Fine-grained Temporal Semantics

Y Hu, Y Gu, C Li, R Chen, D Yu - arXiv preprint arXiv:2409.14709, 2024 - arxiv.org
With recent advances of AIGC, video generation have gained a surge of research interest in
both academia and industry (eg, Sora). However, it remains a challenge to produce …

SEA: State-Exchange Attention for High-Fidelity Physics-Based Transformers

P Esmati, A Dadashzadeh, V Goodarzi… - arXiv preprint arXiv …, 2024 - arxiv.org
Current approaches using sequential networks have shown promise in estimating field
variables for dynamical systems, but they are often limited by high rollout errors. The …

A Survey on Vision Autoregressive Model

K Jiang, J Huang - arXiv preprint arXiv:2411.08666, 2024 - arxiv.org
Autoregressive models have demonstrated great performance in natural language
processing (NLP) with impressive scalability, adaptability and generalizability. Inspired by …