Horizontal pyramid matching for person re-identification

Y Fu, Y Wei, Y Zhou, H Shi, G Huang, X Wang… - Proceedings of the …, 2019 - ojs.aaai.org
Despite the remarkable progress in person re-identification (Re-ID), such approaches still
suffer from the failure cases where the discriminative body parts are missing. To mitigate this …

Man: Moment alignment network for natural language moment retrieval via iterative graph adjustment

D Zhang, X Dai, X Wang, YF Wang… - Proceedings of the …, 2019 - openaccess.thecvf.com
This research strives for natural language moment retrieval in long, untrimmed video
streams. The problem is not trivial especially when a video contains multiple moments of …

Ms-tct: Multi-scale temporal convtransformer for action detection

R Dai, S Das, K Kahatapitiya… - Proceedings of the …, 2022 - openaccess.thecvf.com
Action detection is an essential and challenging task, especially for densely labelled
datasets of untrimmed videos. The temporal relation is complex in those datasets, including …

Learning an augmented rgb representation with cross-modal knowledge distillation for action detection

R Dai, S Das, F Bremond - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
In video understanding, most cross-modal knowledge distillation (KD) methods are tailored
for classification tasks, focusing on the discriminative representation of the trimmed videos …

Pdan: Pyramid dilated attention network for action detection

R Dai, S Das, L Minciullo, L Garattoni… - Proceedings of the …, 2021 - openaccess.thecvf.com
Handling long and complex temporal information is an important factor for action detection
tasks. This challenge is further aggravated by densely distributed actions in untrimmed …

Pat: Position-aware transformer for dense multi-label action detection

F Sardari, A Mustafa, PJB Jackson… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present PAT, a transformer-based network that learns complex temporal co-occurrence
action dependencies in a video by exploiting multi-scale temporal features. In existing …

Toyota smarthome untrimmed: Real-world untrimmed videos for activity detection

R Dai, S Das, S Sharma, L Minciullo… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org
Designing activity detection systems that can be successfully deployed in daily-living
environments requires datasets that pose the challenges typical of real-world scenarios. In …

3D-TDC: A 3D temporal dilation convolution framework for video action recognition

Y Ming, F Feng, C Li, JH Xue - Neurocomputing, 2021 - Elsevier
Video action recognition is a vital area of computer vision. By adding temporal dimension
into convolution structure, 3D convolution neural network owns the capacity to extract spatio …

Dynamic temporal pyramid network: A closer look at multi-scale modeling for activity detection

D Zhang, X Dai, YF Wang - Computer Vision–ACCV 2018: 14th Asian …, 2019 - Springer
Recognizing instances at varying scales simultaneously is a fundamental challenge in
visual detection problems. While spatial multi-scale modeling has been well studied in …

Ctrn: Class-temporal relational network for action detection

R Dai, S Das, F Bremond - arXiv preprint arXiv:2110.13473, 2021 - arxiv.org
Action detection is an essential and challenging task, especially for densely labelled
datasets of untrimmed videos. There are many real-world challenges in those datasets, such …