Sora: A review on background, technology, limitations, and opportunities of large vision models

Y Liu, K Zhang, Y Li, Z Yan, C Gao, R Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …

State of the art on diffusion models for visual computing

R Po, W Yifan, V Golyanik, K Aberman… - Computer Graphics …, 2024 - Wiley Online Library
The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Videopoet: A large language model for zero-shot video generation

D Kondratyuk, L Yu, X Gu, J Lezama, J Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …

When does Sora show: The beginning of TAO to imaginative intelligence and scenarios engineering

FY Wang, Q Miao, L Li, Q Ni, X Li, J Li… - IEEE/CAA Journal of …, 2024 - ieeexplore.ieee.org
During our discussion at workshops for writing “What Does ChatGPT Say: The DAO from
Algorithmic Intelligence to Linguistic Intelligence”[1], we had expected the next milestone for …

Sora for senarios engineering of intelligent vehicles: V&V, C&C, and beyonds

X Li, Q Miao, L Li, Y Hou, Q Ni, L Fan… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
The advent of Scenarios Engineering (SE) paves the way to a new era of intelligent vehicles
(IVs), driven by Artificial Intelligence (AI)-enabled strategies. It aims at shaping the IVs to be …

V3d: Video diffusion models are effective 3d generators

Z Chen, Y Wang, F Wang, Z Wang, H Liu - arXiv preprint arXiv:2403.06738, 2024 - arxiv.org
Automatic 3D generation has recently attracted widespread attention. Recent methods have
greatly accelerated the generation speed, but usually produce less-detailed objects due to …

Anyv2v: A plug-and-play framework for any video-to-video editing tasks

M Ku, C Wei, W Ren, H Yang, W Chen - arXiv preprint arXiv:2403.14468, 2024 - arxiv.org
Video-to-video editing involves editing a source video along with additional control (such as
text prompts, subjects, or styles) to generate a new video that aligns with the source video …

Cameractrl: Enabling camera control for text-to-video generation

H He, Y Xu, Y Guo, G Wetzstein, B Dai, H Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Controllability plays a crucial role in video generation since it allows users to create desired
content. However, existing models largely overlooked the precise control of camera pose …

Vasa-1: Lifelike audio-driven talking faces generated in real time

S Xu, G Chen, YX Guo, J Yang, C Li, Z Zang… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce VASA, a framework for generating lifelike talking faces with appealing visual
affective skills (VAS) given a single static image and a speech audio clip. Our premiere …