Multi-modal 3d object detection in autonomous driving: A survey and taxonomy

L Wang, X Zhang, Z Song, J Bi, G Zhang… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Autonomous vehicles require constant environmental perception to obtain the distribution of
obstacles to achieve safe driving. Specifically, 3D object detection is a vital functional …

Delving into the devils of bird's-eye-view perception: A review, evaluation and recipe

H Li, C Sima, J Dai, W Wang, L Lu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending
and drawing extensive attention both from industry and academia. Conventional …

Bevfusion: Multi-task multi-sensor fusion with unified bird's-eye view representation

Z Liu, H Tang, A Amini, X Yang, H Mao… - … on robotics and …, 2023 - ieeexplore.ieee.org
Multi-sensor fusion is essential for an accurate and reliable autonomous driving system.
Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with …

BEVFormer: Learning Bird's-Eye-View Representation From LiDAR-Camera Via Spatiotemporal Transformers

Z Li, W Wang, H Li, E Xie, C Sima, T Lu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Multi-modality fusion strategy is currently the de-facto most competitive solution for 3D
perception tasks. In this work, we present a new framework termed BEVFormer, which learns …

Transfusion: Robust lidar-camera fusion for 3d object detection with transformers

X Bai, Z Hu, X Zhu, Q Huang, Y Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
LiDAR and camera are two important sensors for 3D object detection in autonomous driving.
Despite the increasing popularity of sensor fusion in this field, the robustness against inferior …

Bevfusion: A simple and robust lidar-camera fusion framework

T Liang, H Xie, K Yu, Z Xia, Z Lin… - Advances in …, 2022 - proceedings.neurips.cc
Fusing the camera and LiDAR information has become a de-facto standard for 3D object
detection tasks. Current methods rely on point clouds from the LiDAR sensor as queries to …

Virtual sparse convolution for multimodal 3d object detection

H Wu, C Wen, S Shi, X Li… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Recently, virtual/pseudo-point-based 3D object detection that seamlessly fuses
RGB images and LiDAR data by depth completion has gained great attention. However …

Unifying voxel-based representation with transformer for 3d object detection

Y Li, Y Chen, X Qi, Z Li, J Sun… - Advances in Neural …, 2022 - proceedings.neurips.cc
In this work, we present a unified framework for multi-modality 3D object detection, named
UVTR. The proposed method aims to unify multi-modality representations in the voxel space …

Focal sparse convolutional networks for 3d object detection

Y Chen, Y Li, X Zhang, J Sun… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Non-uniformed 3D sparse data, eg, point clouds or voxels in different spatial positions, make
contribution to the task of 3D object detection in different ways. Existing basic components in …

Futr3d: A unified sensor fusion framework for 3d detection

X Chen, T Zhang, Y Wang, Y Wang… - proceedings of the …, 2023 - openaccess.thecvf.com
Sensor fusion is an essential topic in many perception systems, such as autonomous driving
and robotics. Existing multi-modal 3D detection models usually involve customized designs …