Robustness-aware 3d object detection in autonomous driving: A review and outlook

Z Song, L Liu, F Jia, Y Luo, C Jia… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
In the realm of modern autonomous driving, the perception system is indispensable for
accurately assessing the state of the surrounding environment, thereby enabling informed …

Detecting as labeling: Rethinking lidar-camera fusion in 3d object detection

J Huang, Y Ye, Z Liang, Y Shan, D Du - European Conference on …, 2025 - Springer
Abstract 3D object Detection with LiDAR-camera encounters overfitting in algorithm
development derived from violating some fundamental rules. We refer to the data annotation …

Recent advances in multi-modal 3D scene understanding: A comprehensive survey and evaluation

Y Lei, Z Wang, F Chen, G Wang, P Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Multi-modal 3D scene understanding has gained considerable attention due to its wide
applications in many areas, such as autonomous driving and human-computer interaction …

Is-fusion: Instance-scene collaborative fusion for multimodal 3d object detection

J Yin, J Shen, R Chen, W Li, R Yang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Bird's eye view (BEV) representation has emerged as a dominant solution for describing 3D
space in autonomous driving scenarios. However objects in the BEV representation typically …

Git: Towards generalist vision transformer through universal language interface

H Wang, H Tang, L Jiang, S Shi, MF Naeem… - … on Computer Vision, 2025 - Springer
This paper proposes a simple, yet effective framework, called GiT, simultaneously applicable
for various vision tasks only with a vanilla ViT. Motivated by the universality of the Multi-layer …

SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection

H Zhang, L Liang, P Zeng, X Song, Z Wang - European Conference on …, 2025 - Springer
Sparse 3D detectors have received significant attention since the query-based paradigm
embraces low latency without explicit dense BEV feature construction. However, these …

PRED: pre-training via semantic rendering on LiDAR point clouds

H Yang, H Wang, D Dai… - Advances in Neural …, 2024 - proceedings.neurips.cc
Pre-training is crucial in 3D-related fields such as autonomous driving where point cloud
annotation is costly and challenging. Many recent studies on point cloud pre-training …

Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers

J Gunn, Z Lenyk, A Sharma, A Donati… - Proceedings of the …, 2024 - openaccess.thecvf.com
Combining complementary sensor modalities is crucial to providing robust perception for
safety-critical robotics applications such as autonomous driving (AD). Recent state-of-the-art …

Samfusion: Sensor-adaptive multimodal fusion for 3d object detection in adverse weather

E Palladin, R Dietze, P Narayanan, M Bijelic… - … on Computer Vision, 2025 - Springer
Multimodal sensor fusion is an essential capability for autonomous robots, enabling object
detection and decision-making in the presence of failing or uncertain inputs. While recent …

UniM AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving

J Zou, T Huang, G Yang, Z Guo, T Luo… - … on Computer Vision, 2025 - Springer
Masked Autoencoders (MAE) play a pivotal role in learning potent representations,
delivering outstanding results across various 3D perception tasks essential for autonomous …