Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities

X Yan, H Zhang, Y Cai, J Guo, W Qiu, B Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
The rise of large foundation models, trained on extensive datasets, is revolutionizing the
field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …

Muvo: A multimodal generative world model for autonomous driving with geometric representations

D Bogdoll, Y Yang, JM Zöllner - arXiv preprint arXiv:2311.11762, 2023 - arxiv.org
Learning unsupervised world models for autonomous driving has the potential to improve
the reasoning capabilities of today's systems dramatically. However, most work neglects the …

OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks

S Sirko-Galouchenko, A Boulch… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce a self-supervised pretraining method called OccFeat for camera-only Bird's-
Eye-View (BEV) segmentation networks. With OccFeat we pretrain a BEV network via …

Clip-dinoiser: Teaching clip a few dino tricks

M Wysoczańska, O Siméoni, M Ramamonjisoa… - arXiv preprint arXiv …, 2023 - arxiv.org
The popular CLIP model displays impressive zero-shot capabilities thanks to its seamless
interaction with arbitrary text prompts. However, its lack of spatial awareness makes it …

Vision-based 3D occupancy prediction in autonomous driving: a review and outlook

Y Zhang, J Zhang, Z Wang, J Xu, D Huang - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, autonomous driving has garnered escalating attention for its potential to
relieve drivers' burdens and improve driving safety. Vision-based 3D occupancy prediction …

Occfusion: Depth estimation free multi-sensor fusion for 3d occupancy prediction

J Zhang, Y Ding - arXiv preprint arXiv:2403.05329, 2024 - arxiv.org
3D occupancy prediction based on multi-sensor fusion, crucial for a reliable autonomous
driving system, enables fine-grained understanding of 3D scenes. Previous fusion-based 3D …

Occfiner: Offboard occupancy refinement with hybrid propagation

H Shi, S Wang, J Zhang, X Yin, Z Wang, Z Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC),
presents a significant challenge in computer vision. Previous methods, confined to onboard …

VEON: Vocabulary-Enhanced Occupancy Prediction

J Zheng, P Tang, Z Wang, G Wang, X Ren… - arXiv preprint arXiv …, 2024 - arxiv.org
Perceiving the world as 3D occupancy supports embodied agents to avoid collision with any
types of obstacle. While open-vocabulary image understanding has prospered recently, how …

RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar

F Ding, X Wen, Y Zhu, Y Li, CX Lu - arXiv preprint arXiv:2405.14014, 2024 - arxiv.org
3D occupancy-based perception pipeline has significantly advanced autonomous driving by
capturing detailed scene descriptions and demonstrating strong generalizability across …

LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering

S Boeder, F Gigengack, B Risse - arXiv preprint arXiv:2407.17310, 2024 - arxiv.org
Semantic occupancy has recently gained significant traction as a prominent method for 3D
scene representation. However, most existing camera-based methods rely on costly …