- 学术资源搜索

Phenaki: Variable length video generation from open domain textual descriptions

R Villegas, M Babaeizadeh, PJ Kindermans… - International …, 2022 - openreview.net

We present Phenaki, a model capable of realistic video synthesis given a sequence of
textual prompts. Generating videos from text is particularly challenging due to the …

被引用次数：264 相关文章所有 5 个版本

[PDF] neurips.cc

Video diffusion models

J Ho, T Salimans, A Gritsenko… - Advances in …, 2022 - proceedings.neurips.cc

Generating temporally coherent high fidelity video is an important milestone in generative
modeling research. We make progress towards this milestone by proposing a diffusion …

被引用次数：937 相关文章所有 8 个版本

[PDF] neurips.cc

Mcvd-masked conditional video diffusion for prediction, generation, and interpolation

V Voleti, A Jolicoeur-Martineau… - Advances in neural …, 2022 - proceedings.neurips.cc

Video prediction is a challenging task. The quality of video frames from current state-of-the-
art (SOTA) generative models tends to be poor and generalization beyond the training data …

被引用次数：188 相关文章所有 9 个版本

[PDF] neurips.cc

Flexible diffusion modeling of long videos

W Harvey, S Naderiparizi, V Masrani… - Advances in …, 2022 - proceedings.neurips.cc

We present a framework for video modeling based on denoising diffusion probabilistic
models that produces long-duration video completions in a variety of realistic environments …

被引用次数：193 相关文章所有 8 个版本

[PDF] thecvf.com

Magvit: Masked generative video transformer

L Yu, Y Cheng, K Sohn, J Lezama… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various
video synthesis tasks with a single model. We introduce a 3D tokenizer to quantize a video …

被引用次数：100 相关文章所有 8 个版本

[PDF] mdpi.com

Diffusion probabilistic modeling for video generation

R Yang, P Srivastava, S Mandt - Entropy, 2023 - mdpi.com

Denoising diffusion probabilistic models are a promising new class of generative models
that mark a milestone in high-quality image generation. This paper showcases their ability to …

被引用次数：186 相关文章所有 9 个版本

[PDF] arxiv.org

Gaia-1: A generative world model for autonomous driving

A Hu, L Russell, H Yeo, Z Murez, G Fedoseev… - arXiv preprint arXiv …, 2023 - arxiv.org

Autonomous driving promises transformative improvements to transportation, but building
systems capable of safely navigating the unstructured complexity of real-world scenarios …

被引用次数：83 相关文章所有 2 个版本

[PDF] arxiv.org

Maskvit: Masked visual pre-training for video prediction

A Gupta, S Tian, Y Zhang, J Wu, R Martín-Martín… - arXiv preprint arXiv …, 2022 - arxiv.org

The ability to predict future visual observations conditioned on past observations and motor
commands can enable embodied agents to plan solutions to a variety of tasks in complex …

被引用次数：101 相关文章所有 4 个版本

[PDF] mlr.press

Reinforcement learning with action-free pre-training from videos

Y Seo, K Lee, SL James… - … Conference on Machine …, 2022 - proceedings.mlr.press

Recent unsupervised pre-training methods have shown to be effective on language and
vision domains by learning useful representations for multiple downstream tasks. In this …

被引用次数：91 相关文章所有 5 个版本

[PDF] arxiv.org

Dall-e-bot: Introducing web-scale diffusion models to robotics

I Kapelyukh, V Vosylius, E Johns - IEEE Robotics and …, 2023 - ieeexplore.ieee.org

We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot
enables a robot to rearrange objects in a scene, by first inferring a text description of those …

被引用次数：92 相关文章所有 5 个版本

高级搜索

QQ 群

Phenaki: Variable length video generation from open domain textual descriptions

Video diffusion models

Mcvd-masked conditional video diffusion for prediction, generation, and interpolation

Flexible diffusion modeling of long videos

Magvit: Masked generative video transformer

Diffusion probabilistic modeling for video generation

Gaia-1: A generative world model for autonomous driving

Maskvit: Masked visual pre-training for video prediction

Reinforcement learning with action-free pre-training from videos

Dall-e-bot: Introducing web-scale diffusion models to robotics

引用