Spatio-temporal learnable proposals for end-to-end video object detection

KA Hashmi, D Stricker, MZ Afzal - arXiv preprint arXiv:2210.02368, 2022 - arxiv.org
This paper presents the novel idea of generating object proposals by leveraging temporal
information for video object detection. The feature aggregation in modern region-based …

Progressive sparse local attention for video object detection

C Guo, B Fan, J Gu, Q Zhang, S Xiang… - Proceedings of the …, 2019 - openaccess.thecvf.com
Transferring image-based object detectors to the domain of videos remains a challenging
problem. Previous efforts mostly exploit optical flow to propagate features across frames …

Object detection in video with spatial-temporal context aggregation

H Luo, L Huang, H Shen, Y Li, C Huang… - arXiv preprint arXiv …, 2019 - arxiv.org
Recent cutting-edge feature aggregation paradigms for video object detection rely on
inferring feature correspondence. The feature correspondence estimation problem is …

Class-aware feature aggregation network for video object detection

L Han, P Wang, Z Yin, F Wang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Recent progress in video object detection (VOD) has shown that aggregating features from
other frames to capture long-range contextual information is very important to deal with the …

Mamba: Multi-level aggregation via memory bank for video object detection

G Sun, Y Hua, G Hu, N Robertson - … of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org
State-of-the-art video object detection methods maintain a memory structure, either a sliding
window or a memory queue, to enhance the current frame using attention mechanisms …

Mining inter-video proposal relations for video object detection

M Han, Y Wang, X Chang, Y Qiao - … , Glasgow, UK, August 23–28, 2020 …, 2020 - Springer
Recent studies have shown that, context aggregating information from proposals in different
frames can clearly enhance the performance of video object detection. However, these …

Learning where to focus for efficient video object detection

Z Jiang, Y Liu, C Yang, J Liu, P Gao, Q Zhang… - Computer Vision–ECCV …, 2020 - Springer
Transferring existing image-based detectors to the video is non-trivial since the quality of
frames is always deteriorated by part occlusion, rare pose, and motion blur. Previous …

Semi-supervised dff: Decoupling detection and feature flow for video object detectors

G Han, X Zhang, C Li - Proceedings of the 26th ACM international …, 2018 - dl.acm.org
For efficient video object detection, our detector consists of a spatial module and a temporal
module. The spatial module aims to detect objects in static frames using convolutional …

Identity-Consistent Aggregation for Video Object Detection

C Deng, D Chen, Q Wu - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Abstract In Video Object Detection (VID), a common practice is to leverage the rich temporal
contexts from the video to enhance the object representations in each frame. Existing …

TransVOD: end-to-end video object detection with spatial-temporal transformers

Q Zhou, X Li, L He, Y Yang, G Cheng… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org
Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the
need for many hand-designed components in object detection while demonstrating good …