[PDF][PDF] Lifts: Lidar and monocular image fusion for multi-object tracking and segmentation

H Zhang, Y Wang, J Cai, HM Hsu, H Ji… - … , IEEE Conference on …, 2020 - academia.edu
BMTT Challenge Workshop, IEEE Conference on Computer Vision and Pattern …, 2020academia.edu
In recent years, the computer vision society has made significant progress in multi-object
tracking (MOT) and video object segmentation (VOS) respectively. Further progress can be
achieved by effectively combining the following tasks together–detection, segmentation and
tracking. In this work, we propose a multi-stage framework called “Lidar and monocular
Image Fusion based multiobject Tracking and Segmentation (LIFTS)” for multiobject tracking
and segmentation (MOTS). In the first stage, we use a 3D Part-Aware and Aggregation …
Abstract
In recent years, the computer vision society has made significant progress in multi-object tracking (MOT) and video object segmentation (VOS) respectively. Further progress can be achieved by effectively combining the following tasks together–detection, segmentation and tracking. In this work, we propose a multi-stage framework called “Lidar and monocular Image Fusion based multiobject Tracking and Segmentation (LIFTS)” for multiobject tracking and segmentation (MOTS). In the first stage, we use a 3D Part-Aware and Aggregation Network detector on the point cloud data to get 3D object locations. Then a graph-based 3D TrackletNet Tracker (3D TNT), which takes both CNN appearance features and object spatial information of detections, is applied to robustly associate objects along time. The second stage involves a Cascade Mask R-CNN based network with PointRend head for obtaining instance segmentation results from monocular images. Its input pre-computed region proposals are generated from projecting 3D detections in the first stage onto a 2D image plane. Moreover, two post-processing techniques are further applied in the last stage:(1) generated mask results are refined by an optical-flow guided instance segmentation network;(2) object re-identification (ReID) is applied to recover ID switches caused by long-term occlusion; Overall, our proposed framework is evaluated on BMTT Challenge 2020 Track2: KITTI-MOTS dataset and achieves a 79.6 sMOTSA for Car and 64.9 for Pedestrian, with the 2nd place ranking in the competition.
academia.edu
以上显示的是最相近的搜索结果。 查看全部搜索结果