Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots

Chen, Qi; Sun, Lin; Wang, Zhixin; Jia, Kui; Yuille, Alan

doi:10.1007/978-3-030-58589-1_5

Qi Chen^12,13,
Lin Sun¹²,
Zhixin Wang¹⁴,
Kui Jia^14,15 &
…
Alan Yuille¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12366))

Included in the following conference series:

European Conference on Computer Vision

4418 Accesses
76 Citations

Abstract

Accurate 3D object detection in LiDAR based point clouds suffers from the challenges of data sparsity and irregularities. Existing methods strive to organize the points regularly, e.g. voxelize, pass them through a designed 2D/3D neural network, and then define object-level anchors that predict offsets of 3D bounding boxes using collective evidences from all the points on the objects of interest. Contrary to the state-of-the-art anchor-based methods, based on the very nature of data sparsity, we observe that even points on an individual object part are informative about semantic information of the object. We thus argue in this paper for an approach opposite to existing methods using object-level anchors. Inspired by compositional models, which represent an object as parts and their spatial relations, we propose to represent an object as composition of its interior non-empty voxels, termed hotspots, and the spatial relations of hotspots. This gives rise to the representation of Object as Hotspots (OHS). Based on OHS, we further propose an anchor-free detection head with a novel ground truth assignment strategy that deals with inter-object point-sparsity imbalance to prevent the network from biasing towards objects with more points. Experimental results show that our proposed method works remarkably well on objects with a small number of points. Notably, our approach ranked \(1^{st}\) on KITTI 3D Detection Benchmark for cyclist and pedestrian detection, and achieved state-of-the-art performance on NuScenes 3D Detection Benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: EUR 29.95; Price includes VAT (India)

eBook: EUR 85.59; Price includes VAT (India)

Softcover Book: EUR 99.99; Price excludes VAT (India)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

CasFormer: Cascaded Transformer Based on Dynamic Voxel Pyramid for 3D Object Detection from Point Clouds

OKGR: Occluded Keypoint Generation and Refinement for 3D Object Detection

PV-RCNN++: semantical point-voxel feature interaction for 3D object detection

Article 29 September 2022

References

Caesar, H., et al.: Nuscenes: a multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027 (2019)
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: CVPR (2017)
Google Scholar
Chen, Y., Liu, S., Shen, X., Jia, J.: Fast point R-CNN. In: ICCV, October 2019
Google Scholar
Dai, J., Hong, Y., Hu, W., Zhu, S.C., Nian Wu, Y.: Unsupervised learning of dictionaries of hierarchical compositional models. In: CVPR (2014)
Google Scholar
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: keypoint triplets for object detection. In: ICCV, pp. 6569–6578 (2019)
Google Scholar
Fidler, S., Boben, M., Leonardis, A.: Learning a hierarchical compositional shape vocabulary for multi-class object representation. arXiv preprint arXiv:1408.5516 (2014)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR (2012)
Google Scholar
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Bygh9j09KX
Girshick, R.: Fast R-CNN. In: ICCV (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Hu, P., Ziglar, J., Held, D., Ramanan, D.: What you see is what you get: exploiting visibility for 3d object detection. arXiv preprint arXiv:1912.04986 (2019)
Jin, Y., Geman, S.: Context and hierarchy in a probabilistic image model. In: CVPR (2006)
Google Scholar
Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: ICCV (2017)
Google Scholar
Kong, T., Sun, F., Liu, H., Jiang, Y., Shi, J.: Foveabox: beyond anchor-based object detector. arXiv preprint arXiv:1904.03797 (2019)
Kortylewski, A., et al.: Greedy structure learning of hierarchical compositional models. arXiv preprint arXiv:1701.06171 (2017)
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3d proposal generation and object detection from view aggregation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2018)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: CVPR (2019)
Google Scholar
Law, H., Deng, J.: Cornernet: Detecting objects as paired keypoints. In: ECCV, pp. 734–750 (2018)
Google Scholar
Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3d object detection. In: CVPR (2019)
Google Scholar
Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3d object detection. In: ECCV (2018)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Loshchilov, I., Hutter, F.: Fixing weight decay regularization in adam. arXiv preprint arXiv:1711.05101 (2017)
Maturana, D., Scherer, S.: Voxnet: a 3d convolutional neural network for real-time object recognition. In: IROS (2015)
Google Scholar
Meyer, G.P., Laddha, A., Kee, E., Vallespi-Gonzalez, C., Wellington, C.K.: Lasernet: an efficient probabilistic 3d object detector for autonomous driving. In: CVPR (2019)
Google Scholar
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3d object detection in point clouds. In: ICCV (2019)
Google Scholar
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from RGB-d data. In: CVPR (2018)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural Information Processing Systems (2015)
Google Scholar
Shi, S., Wang, X., Li, H.: Pointrcnn: 3d object proposal generation and detection from point cloud. In: CVPR (2019)
Google Scholar
Simon, M., Milz, S., Amende, K., Gross, H.M.: Complex-yolo: an euler-region-proposal for real-time 3d object detection on point clouds. In: ECCV (2018)
Google Scholar
Smith, L.N., Topin, N.: Super-convergence: very fast training of neural networks using large learning rates. In: Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, vol. 11006, p. 1100612. International Society for Optics and Photonics (2019)
Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. arXiv preprint arXiv:1904.01355 (2019)
Wang, B., An, J., Cao, J.: Voxel-FPN: multi-scale voxel feature aggregation in 3d object detection from point clouds. arXiv preprint arXiv:1907.05286 (2019)
Wang, W., Yu, R., Huang, Q., Neumann, U.: SGPN: Similarity group proposal network for 3d point cloud instance segmentation. In: CVPR (2018)
Google Scholar
Wang, Z., Jia, K.: Frustum convnet: sliding frustums to aggregate local point-wise features for amodal 3d object detection. In: IROS (2019)
Google Scholar
Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Article Google Scholar
Yang, B., Luo, W., Urtasun, R.: Pixor: real-time 3d object detection from point clouds. In: CVPR (2018)
Google Scholar
Yang, B., et al: Learning object bounding boxes for 3d instance segmentation on point clouds. arXiv preprint arXiv:1906.01140 (2019)
Yang, Z., Sun, Y., Liu, S., Shen, X., Jia, J.: STD: sparse-to-dense 3d object detector for point cloud. arXiv preprint arXiv:1907.10471 (2019)
Ye, Y., Chen, H., Zhang, C., Hao, X., Zhang, Z.: Sarpnet: shape attention regional proposal network for lidar-based 3d object detection. Neurocomputing 379, 53–63 (2020)
Article Google Scholar
Yin, T., Zhou, X., Krähenbühl, P.: Center-based 3d object detection and tracking. arXiv:2006.11275 (2020)
Zhang, Z., Xie, C., Wang, J., Xie, L., Yuille, A.L.: Deepvoting: a robust and explainable deep network for semantic part detection under partial occlusion. In: CVPR (2018)
Google Scholar
Zhou, D., et al.: IOU loss for 2d/3d object detection. arXiv preprint arXiv:1908.03851 (2019)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Zhou, X., Zhuo, J., Krahenbuhl, P.: Bottom-up object detection by grouping extreme and center points. In: CVPR, pp. 850–859 (2019)
Google Scholar
Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3d object detection. In: CVPR (2018)
Google Scholar
Zhu, B., Jiang, Z., Zhou, X., Li, Z., Yu, G.: Class-balanced grouping and sampling for point cloud 3d object detection. arXiv preprint arXiv:1908.09492 (2019)
Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. arXiv preprint arXiv:1903.00621 (2019)
Zhu, L.L., Lin, C., Huang, H., Chen, Y., Yuille, A.: Unsupervised structure learning: hierarchical recursive composition, suspicious coincidence and competitive exclusion. In: ECCV (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Samsung Strategy and Innovation Center, San Jose, CA, 95134, USA
Qi Chen & Lin Sun
The Johns Hopkins University, Baltimore, MD, 21218, USA
Qi Chen & Alan Yuille
South China University of Technology, Guangzhou, China
Zhixin Wang & Kui Jia
Pazhou Lab, Guangzhou, 510335, China
Kui Jia

Authors

Qi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zhixin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kui Jia
View author publications
You can also search for this author in PubMed Google Scholar
Alan Yuille
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lin Sun .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Q., Sun, L., Wang, Z., Jia, K., Yuille, A. (2020). Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-58589-1_5
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58588-4
Online ISBN: 978-3-030-58589-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots

Abstract

Access this chapter

Similar content being viewed by others

CasFormer: Cascaded Transformer Based on Dynamic Voxel Pyramid for 3D Object Detection from Point Clouds

OKGR: Occluded Keypoint Generation and Refinement for 3D Object Detection

PV-RCNN++: semantical point-voxel feature interaction for 3D object detection

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots

Abstract

Access this chapter

Similar content being viewed by others

CasFormer: Cascaded Transformer Based on Dynamic Voxel Pyramid for 3D Object Detection from Point Clouds

OKGR: Occluded Keypoint Generation and Refinement for 3D Object Detection

PV-RCNN++: semantical point-voxel feature interaction for 3D object detection

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation