A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …

3D object detection for autonomous driving: A comprehensive survey

J Mao, S Shi, X Wang, H Li - International Journal of Computer Vision, 2023 - Springer
Autonomous driving, in recent years, has been receiving increasing attention for its potential
to relieve drivers' burdens and improve the safety of driving. In modern autonomous driving …

Voxelnext: Fully sparse voxelnet for 3d object detection and tracking

Y Chen, J Liu, X Zhang, X Qi… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract 3D object detectors usually rely on hand-crafted proxies, eg, anchors or centers,
and translate well-studied 2D frameworks to 3D. Thus, sparse voxel features need to be …

Multimodal virtual point 3d detection

T Yin, X Zhou, P Krähenbühl - Advances in Neural …, 2021 - proceedings.neurips.cc
Lidar-based sensing drives current autonomous vehicles. Despite rapid progress, current
Lidar sensors still lag two decades behind traditional color cameras in terms of resolution …

Centerformer: Center-based transformer for 3d object detection

Z Zhou, X Zhao, Y Wang, P Wang… - European Conference on …, 2022 - Springer
Query-based transformer has shown great potential in constructing long-range attention in
many image-domain tasks, but has rarely been considered in LiDAR-based 3D object …

Logonet: Towards accurate 3d object detection with local-to-global cross-modal fusion

X Li, T Ma, Y Hou, B Shi, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
LiDAR-camera fusion methods have shown impressive performance in 3D object detection.
Recent advanced multi-modal methods mainly perform global fusion, where image features …

Dsvt: Dynamic sparse voxel transformer with rotated sets

H Wang, C Shi, S Shi, M Lei, S Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Designing an efficient yet deployment-friendly 3D backbone to handle sparse point clouds is
a fundamental problem in 3D perception. Compared with the customized sparse …

Persformer: 3d lane detection via perspective transformer and the openlane benchmark

L Chen, C Sima, Y Li, Z Zheng, J Xu, X Geng… - … on Computer Vision, 2022 - Springer
Methods for 3D lane detection have been recently proposed to address the issue of
inaccurate lane layouts in many autonomous driving scenarios (uphill/downhill, bump, etc.) …

Flatformer: Flattened window attention for efficient point cloud transformer

Z Liu, X Yang, H Tang, S Yang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Transformer, as an alternative to CNN, has been proven effective in many modalities (eg,
texts and images). For 3D point cloud transformers, existing efforts focus primarily on …

Cat-det: Contrastively augmented transformer for multi-modal 3d object detection

Y Zhang, J Chen, D Huang - Proceedings of the IEEE/CVF …, 2022 - openaccess.thecvf.com
In autonomous driving, LiDAR point-clouds and RGB images are two major data modalities
with complementary cues for 3D object detection. However, it is quite difficult to sufficiently …