S Ding, P Zhao, X Zhang, R Qian… - Proceedings of the …, 2023 - openaccess.thecvf.com
Transformers have become the primary backbone of the computer vision community due to their impressive performance. However, the unfriendly computation cost impedes their …
L Chen, Z Tong, Y Song, G Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Streaming video clips with large-scale video tokens impede vision transformers (ViTs) for efficient recognition, especially in video action detection where sufficient spatiotemporal …
We present a new algorithm for the selection of informative frames in video action recognition. Our approach is designed for aerial videos captured using a moving camera …
H Wang, W Zhang, G Liu - Applied Sciences, 2023 - mdpi.com
In the domain of video recognition, video transformers have demonstrated remarkable performance, albeit at significant computational cost. This paper introduces TSNet, an …
S Hwang, J Yoon, Y Lee, SJ Hwang - arXiv preprint arXiv:2211.10636, 2022 - arxiv.org
Recently emerged Masked Video Modeling techniques demonstrated their potential by significantly outperforming previous methods in self-supervised learning for video. However …
We present a new general learning approach, Prompt Learning for Action Recognition (PLAR), which leverages the strengths of prompt learning to guide the learning process. Our …
Z Feng, J Xu, L Ma, S Zhang - ACM Transactions on Multimedia …, 2024 - dl.acm.org
Transformer has exhibited promising performance in various video recognition tasks but brings a huge computational cost in modeling spatial-temporal cues. This work aims to boost …
S Uddin, T Nawaz, J Ferryman, N Rashid… - IEEE …, 2024 - ieeexplore.ieee.org
Several efforts have been made to develop effective and robust vision-based solutions for human action recognition in aerial videos. Generally, the existing methods rely on the …