In humans, Attention is a core property of all perceptual and cognitive operations. Given our limited ability to process competing sources, attention mechanisms select, modulate, and …
In today's world, video captioning is extensively used in various applications for specially- abled and, more specifically, visually abled persons. With advancements in technology for …
TM Le, V Le, S Venkatesh… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Video question answering (VideoQA) is challenging as it requires modeling capacity to distill dynamic visual artifacts and distant relations and to associate them with linguistic concepts …
Data-driven saliency has recently gained a lot of attention thanks to the use of convolutional neural networks for predicting gaze fixations. In this paper, we go beyond standard …
Abstract Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term …
H Ryu, S Kang, H Kang, CD Yoo - … of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org
This paper considers a video caption generating network referred to as Semantic Grouping Network (SGN) that attempts (1) to group video frames with discriminating word phrases of …
Automatic generation of video captions is a fundamental challenge in computer vision. Recent techniques typically employ a combination of Convolutional Neural Networks …
Q Zheng, C Wang, D Tao - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com
Existing methods on video captioning have made great efforts to identify objects/instances in videos, but few of them emphasize the prediction of action. As a result, the learned models …
V Iashin, E Rahtu - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com
Dense video captioning is a task of localizing interesting events from an untrimmed video and producing textual description (captions) for each localized event. Most of the previous …