作者
Liang Han, Pichao Wang, Zhaozheng Yin, Fan Wang, Hao Li
发表日期
2021/7/5
期刊
IEEE Transactions on Circuits and Systems for Video Technology
卷号
32
期号
12
页码范围
8165-8178
出版商
IEEE
简介
Recent progress in video object detection (VOD) has shown that aggregating features from other frames to capture long-range contextual information is very important to deal with the challenges in VOD, such as partial occlusion, motion blur, etc. To exploit more effective feature aggregation, we propose several improvements over previous works in this paper: (1) a class-aware pixel-level feature aggregation module, which characterizes a pixel by exploiting the context information lying in the instances from both the current frame and other frames. Different from the previous non-local operation, the proposed class-aware pixel-level feature aggregation filters out most of the noisy information from the large scope of background and objects in different classes, and only enhances representation of a foreground pixel with the same class instances with limited ambiguous information; (2) a class-aware instance-level …
引用总数
学术搜索中的文章
L Han, P Wang, Z Yin, F Wang, H Li - IEEE Transactions on Circuits and Systems for Video …, 2021