Driveworld: 4d pre-trained scene understanding via world models for autonomous driving

C Min, D Zhao, L Xiao, J Zhao, X Xu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Vision-centric autonomous driving has recently raised wide attention due to its lower cost.
Pre-training is essential for extracting a universal representation. However current vision …

RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features

G Bang, K Choi, J Kim, D Kum… - Proceedings of the …, 2024 - openaccess.thecvf.com
The inherent noisy and sparse characteristics of radar data pose challenges in finding
effective representations for 3D object detection. In this paper we propose RadarDistill a …

OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks

S Sirko-Galouchenko, A Boulch… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce a self-supervised pretraining method called OccFeat for camera-only Bird's-
Eye-View (BEV) segmentation networks. With OccFeat we pretrain a BEV network via …

SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects

A Kumar, Y Guo, X Huang, L Ren… - Proceedings of the …, 2024 - openaccess.thecvf.com
Monocular 3D detectors achieve remarkable performance on cars and smaller objects.
However their performance drops on larger objects leading to fatal accidents. Some attribute …

CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation

L Zhao, J Song, KA Skinner - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
In the field of 3D object detection for autonomous driving LiDAR-Camera (LC) fusion is the
top-performing sensor configuration. Still LiDAR is relatively high cost which hinders …

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

Y Gao, Z Wang, WS Zheng, C Xie… - Proceedings of the …, 2024 - openaccess.thecvf.com
Contrastive learning has emerged as a promising paradigm for 3D open-world
understanding ie aligning point cloud representation to image and text embedding space …

Bev-io: Enhancing bird's-eye-view 3d detection with instance occupancy

Z Zhang, Y Zhang, L Wang, Y Wang, H Lu - arXiv preprint arXiv …, 2023 - arxiv.org
A popular approach for constructing bird's-eye-view (BEV) representation in 3D detection is
to lift 2D image features onto the viewing frustum space based on explicitly predicted depth …

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

HI Liu, C Wu, JH Cheng, W Chai, SY Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Monocular 3D object detection (Mono3D) is an indispensable research topic in autonomous
driving, thanks to the cost-effective monocular camera sensors and its wide range of …

Distilling Temporal Knowledge with Masked Feature Reconstruction for 3D Object Detection

H Zheng, D Cao, J Xu, R Ai, W Gu, Y Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
Striking a balance between precision and efficiency presents a prominent challenge in the
bird's-eye-view (BEV) 3D object detection. Although previous camera-based BEV methods …

[PDF][PDF] EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network

T Wen, M Yang, Y Xu, D Yang - arxiv.org
3D occupancy prediction (Occ) is a rapidly rising challenging perception task in the field of
autonomous driving which represents the driving scene as uniformly partitioned 3D voxel …