Self-Supervised Visual Planning with Temporal Skip Connections.

Artificial intelligence (AI) in augmented reality (AR)-assisted manufacturing applications: a review

CK Sahu, C Young, R Rai - International Journal of Production …, 2021 - Taylor & Francis

Augmented reality (AR) has proven to be an invaluable interactive medium to reduce
cognitive load by bridging the gap between the task-at-hand and relevant information by …

被引用次数：177 相关文章所有 6 个版本

[PDF] zjujournals.com

Deep reinforcement learning: a survey

H Wang, N Liu, Y Zhang, D Feng, F Huang, D Li… - Frontiers of Information …, 2020 - Springer

Deep reinforcement learning (RL) has become one of the most popular topics in artificial
intelligence research. It has been widely used in various fields, such as end-to-end control …

被引用次数：198 相关文章所有 11 个版本

[PDF] neurips.cc

Video diffusion models

J Ho, T Salimans, A Gritsenko… - Advances in …, 2022 - proceedings.neurips.cc

Generating temporally coherent high fidelity video is an important milestone in generative
modeling research. We make progress towards this milestone by proposing a diffusion …

被引用次数：885 相关文章所有 8 个版本

[PDF] neurips.cc

Mcvd-masked conditional video diffusion for prediction, generation, and interpolation

V Voleti, A Jolicoeur-Martineau… - Advances in neural …, 2022 - proceedings.neurips.cc

Video prediction is a challenging task. The quality of video frames from current state-of-the-
art (SOTA) generative models tends to be poor and generalization beyond the training data …

被引用次数：180 相关文章所有 9 个版本

[PDF] thecvf.com

Magvit: Masked generative video transformer

L Yu, Y Cheng, K Sohn, J Lezama… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various
video synthesis tasks with a single model. We introduce a 3D tokenizer to quantize a video …

被引用次数：95 相关文章所有 8 个版本

[PDF] arxiv.org

Nüwa: Visual synthesis pre-training for neural visual world creation

C Wu, J Liang, L Ji, F Yang, Y Fang, D Jiang… - European conference on …, 2022 - Springer

This paper presents a unified multimodal pre-trained model called NÜWA that can generate
new or manipulate existing visual data (ie, images and videos) for various visual synthesis …

被引用次数：262 相关文章所有 6 个版本

[HTML] mdpi.com

[HTML][HTML] Diffusion probabilistic modeling for video generation

R Yang, P Srivastava, S Mandt - Entropy, 2023 - mdpi.com

Denoising diffusion probabilistic models are a promising new class of generative models
that mark a milestone in high-quality image generation. This paper showcases their ability to …

被引用次数：179 相关文章所有 9 个版本

[PDF] arxiv.org

Mastering atari with discrete world models

D Hafner, T Lillicrap, M Norouzi, J Ba - arXiv preprint arXiv:2010.02193, 2020 - arxiv.org

Intelligent agents need to generalize from past experience to achieve goals in complex
environments. World models facilitate such generalization and allow learning behaviors …

被引用次数：731 相关文章所有 7 个版本

[PDF] arxiv.org

Videogpt: Video generation using vq-vae and transformers

W Yan, Y Zhang, P Abbeel, A Srinivas - arXiv preprint arXiv:2104.10157, 2021 - arxiv.org

We present VideoGPT: a conceptually simple architecture for scaling likelihood based
generative modeling to natural videos. VideoGPT uses VQ-VAE that learns downsampled …

被引用次数：316 相关文章所有 2 个版本

[PDF] arxiv.org

Predrnn: A recurrent neural network for spatiotemporal predictive learning

Y Wang, H Wu, J Zhang, Z Gao, J Wang… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org

The predictive learning of spatiotemporal sequences aims to generate future images by
learning from the historical context, where the visual dynamics are believed to have modular …

被引用次数：296 相关文章所有 6 个版本

高级搜索

QQ 群