Hierarchical boundary-aware neural encoder for video captioning

N Aafaq, A Mian, W Liu, SZ Gilani, M Shah - ACM Computing Surveys …, 2019 - dl.acm.org

Video description is the automatic generation of natural language sentences that describe
the contents of a given video. It has applications in human-robot interaction, helping the …

被引用次数：238 相关文章所有 10 个版本

[PDF] researchgate.net

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer

In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

被引用次数：177 相关文章所有 8 个版本

Video captioning: a review of theory, techniques and practices.

V Jain, F Al-Turjman, G Chaudhary… - Multimedia Tools & …, 2022 - search.ebscohost.com

In today's world, video captioning is extensively used in various applications for specially-
abled and, more specifically, visually abled persons. With advancements in technology for …

被引用次数：28 相关文章所有 3 个版本

[PDF] thecvf.com

Hierarchical conditional relation networks for video question answering

TM Le, V Le, S Venkatesh… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

Video question answering (VideoQA) is challenging as it requires modeling capacity to distill
dynamic visual artifacts and distant relations and to associate them with linguistic concepts …

被引用次数：287 相关文章所有 11 个版本

[PDF] arxiv.org

Predicting human eye fixations via an lstm-based saliency attentive model

M Cornia, L Baraldi, G Serra… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org

Data-driven saliency has recently gained a lot of attention thanks to the use of convolutional
neural networks for predicting gaze fixations. In this paper, we go beyond standard …

被引用次数：663 相关文章所有 16 个版本

[PDF] arxiv.org

Working memory connections for LSTM

F Landi, L Baraldi, M Cornia, R Cucchiara - Neural Networks, 2021 - Elsevier

Abstract Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of
gating mechanisms to mitigate exploding and vanishing gradients when learning long-term …

被引用次数：107 相关文章所有 9 个版本

[PDF] aaai.org

Semantic grouping network for video captioning

H Ryu, S Kang, H Kang, CD Yoo - … of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org

This paper considers a video caption generating network referred to as Semantic Grouping
Network (SGN) that attempts (1) to group video frames with discriminating word phrases of …

被引用次数：131 相关文章所有 8 个版本

[PDF] thecvf.com

Spatio-temporal dynamics and semantic attribute enriched visual encoding for video captioning

N Aafaq, N Akhtar, W Liu, SZ Gilani… - Proceedings of the …, 2019 - openaccess.thecvf.com

Automatic generation of video captions is a fundamental challenge in computer vision.
Recent techniques typically employ a combination of Convolutional Neural Networks …

被引用次数：271 相关文章所有 11 个版本

[PDF] thecvf.com

Syntax-aware action targeting for video captioning

Q Zheng, C Wang, D Tao - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com

Existing methods on video captioning have made great efforts to identify objects/instances in
videos, but few of them emphasize the prediction of action. As a result, the learned models …

被引用次数：184 相关文章所有 5 个版本

[PDF] thecvf.com

Multi-modal dense video captioning

V Iashin, E Rahtu - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com

Dense video captioning is a task of localizing interesting events from an untrimmed video
and producing textual description (captions) for each localized event. Most of the previous …

被引用次数：190 相关文章所有 9 个版本

高级搜索

QQ 群