Attention mechanisms in computer vision: A survey

MH Guo, TX Xu, JJ Liu, ZN Liu, PT Jiang, TJ Mu… - Computational visual …, 2022 - Springer
Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …

Asm-loc: Action-aware segment modeling for weakly-supervised temporal action localization

B He, X Yang, L Kang, Z Cheng… - Proceedings of the …, 2022 - openaccess.thecvf.com
Weakly-supervised temporal action localization aims to recognize and localize action
segments in untrimmed videos given only video-level action labels for training. Without the …

Align and attend: Multimodal summarization with dual contrastive losses

B He, J Wang, J Qiu, T Bui… - Proceedings of the …, 2023 - openaccess.thecvf.com
The goal of multimodal summarization is to extract the most important information from
different modalities to form summaries. Unlike unimodal summarization, the multimodal …

Chop & learn: Recognizing and generating object-state compositions

N Saini, H Wang, A Swaminathan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recognizing and generating object-state compositions has been a challenging task,
especially when generalizing to unseen compositions. In this paper, we study the task of …

Towards scalable neural representation for diverse videos

B He, X Yang, H Wang, Z Wu, H Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Implicit neural representations (INR) have gained increasing attention in representing 3D
scenes and images, and have been recently applied to encode videos (eg, NeRV, E-NeRV) …

Omnivid: A generative framework for universal video understanding

J Wang, D Chen, C Luo, B He, L Yuan… - Proceedings of the …, 2024 - openaccess.thecvf.com
The core of video understanding tasks such as recognition captioning and tracking is to
automatically detect objects or actions in a video and analyze their temporal evolution …

Efficient video transformers with spatial-temporal token selection

J Wang, X Yang, H Li, L Liu, Z Wu, YG Jiang - European Conference on …, 2022 - Springer
Video transformers have achieved impressive results on major video recognition
benchmarks, which however suffer from high computational cost. In this paper, we present …

Metagait: Learning to learn an omni sample adaptive representation for gait recognition

H Dou, P Zhang, W Su, Y Yu, X Li - European Conference on Computer …, 2022 - Springer
Gait recognition, which aims at identifying individuals by their walking patterns, has recently
drawn increasing research attention. However, gait recognition still suffers from the conflicts …

Improving RGB-D salient object detection via modality-aware decoder

M Song, W Song, G Yang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Most existing RGB-D salient object detection (SOD) methods are primarily focusing on cross-
modal and cross-level saliency fusion, which has been proved to be efficient and effective …

Efficient spatio-temporal modeling methods for real-time violence recognition

MS Kang, RH Park, HM Park - IEEE Access, 2021 - ieeexplore.ieee.org
Violence recognition is challenging since recognition must be performed on videos acquired
by a lot of surveillance cameras at any time or place. It should make reliable detections in …