Is sora a world simulator? a comprehensive survey on general world models and beyond

Z Zhu, X Wang, W Zhao, C Min, N Deng, M Dou… - arXiv preprint arXiv …, 2024 - arxiv.org
General world models represent a crucial pathway toward achieving Artificial General
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …

Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities

X Yan, H Zhang, Y Cai, J Guo, W Qiu, B Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
The rise of large foundation models, trained on extensive datasets, is revolutionizing the
field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …

OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks

S Sirko-Galouchenko, A Boulch… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce a self-supervised pretraining method called OccFeat for camera-only Bird's-
Eye-View (BEV) segmentation networks. With OccFeat we pretrain a BEV network via …

Improving Distant 3D Object Detection Using 2D Box Supervision

Z Yang, Z Yu, C Choy, R Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Improving the detection of distant 3d objects is an important yet challenging task. For camera-
based 3D perception the annotation of 3d bounding relies heavily on LiDAR for accurate …

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

S Gao, J Yang, L Chen, K Chitta, Y Qiu… - arXiv preprint arXiv …, 2024 - arxiv.org
World models can foresee the outcomes of different actions, which is of paramount
importance for autonomous driving. Nevertheless, existing driving world models still have …

Learning Manipulation by Predicting Interaction

J Zeng, Q Bu, B Wang, W Xia, L Chen, H Dong… - arXiv preprint arXiv …, 2024 - arxiv.org
Representation learning approaches for robotic manipulation have boomed in recent years.
Due to the scarcity of in-domain robot data, prevailing methodologies tend to leverage large …

Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation

E Ma, L Zhou, T Tang, Z Zhang, D Han, J Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Using generative models to synthesize new data has become a de-facto standard in
autonomous driving to address the data scarcity issue. Though existing approaches are able …

BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space

Y Zhang, S Gong, K Xiong, X Ye, X Tan, F Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
World models are receiving increasing attention in autonomous driving for their ability to
predict potential future scenarios. In this paper, we present BEVWorld, a novel approach that …

[PDF][PDF] Generalized Predictive Model for Autonomous Driving Supplementary Material

J Yang, S Gao, Y Qiu, L Chen, T Li, B Dai, K Chitta… - openaccess.thecvf.com
Video is a particularly universal and scalable target given a wealth of uncalibrated driving
videos. Different from BEV representations [25, 37] that require camera extrinsic parameters …

[PDF][PDF] D2-World: An Efficient World Model through Decoupled Dynamic Flow

H Zhang, X Yan, Y Xue, Z Guo, S Cui, Z Li, B Liu - opendrivelab.github.io
This technical report summarizes the second-place solution for the Predictive World Model
Challenge held at the CVPR-2024 Workshop on Foundation Models for Autonomous …