Video captioning with recurrent networks based on frame-and video-level features and visual...

M Amer, T Maul - Artificial Intelligence Review, 2019 - Springer

Artificial neural networks (ANNs) have achieved significant success in tackling classical and
modern machine learning problems. As learning problems grow in scale and complexity …

被引用次数：104 相关文章所有 9 个版本

[PDF] google.com

STAT: Spatial-temporal attention mechanism for video captioning

C Yan, Y Tu, X Wang, Y Zhang, X Hao… - IEEE transactions on …, 2019 - ieeexplore.ieee.org

Video captioning refers to automatic generate natural language sentences, which
summarize the video contents. Inspired by the visual attention mechanism of human beings …

被引用次数：406 相关文章所有 4 个版本

Hierarchical LSTMs with adaptive attention for visual captioning

L Gao, X Li, J Song, HT Shen - IEEE transactions on pattern …, 2019 - ieeexplore.ieee.org

Recent progress has been made in using attention based encoder-decoder framework for
image and video captioning. Most existing decoders apply the attention mechanism to every …

被引用次数：286 相关文章所有 5 个版本

[PDF] thecvf.com

Video captioning with transferred semantic attributes

Y Pan, T Yao, H Li, T Mei - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com

Automatically generating natural language descriptions of videos plays a fundamental
challenge for computer vision community. Most recent progress in this problem has been …

被引用次数：421 相关文章所有 9 个版本

[PDF] springer.com

Movie description

A Rohrbach, A Torabi, M Rohrbach, N Tandon… - International Journal of …, 2017 - Springer

Audio description (AD) provides linguistic descriptions of movies and allows visually
impaired people to follow a movie along with their peers. Such descriptions are by design …

被引用次数：433 相关文章所有 17 个版本

[PDF] thecvf.com

Audio visual scene-aware dialog

H Alamri, V Cartillier, A Das, J Wang… - Proceedings of the …, 2019 - openaccess.thecvf.com

We introduce the task of scene-aware dialog. Our goal is to generate a complete and natural
response to a question about a scene, given video and audio of the scene and the history of …

被引用次数：209 相关文章所有 10 个版本

CAM-RNN: Co-attention model based RNN for video captioning

B Zhao, X Li, X Lu - IEEE Transactions on Image Processing, 2019 - ieeexplore.ieee.org

Video captioning is a technique that bridges vision and language together, for which both
visual information and text information are quite important. Typical approaches are based on …

被引用次数：140 相关文章所有 7 个版本

[PDF] ieee.org

Clinical report guided retinal microaneurysm detection with multi-sieving deep learning

L Dai, R Fang, H Li, X Hou, B Sheng… - IEEE transactions on …, 2018 - ieeexplore.ieee.org

Notice of Violation of IEEE Publication Principles" Clinical Report Guided Retinal
Microaneurysm Detection With Multi-Sieving Deep Learning," by Ling Dai, Ruogu Fang …

被引用次数：165 相关文章所有 4 个版本

[PDF] ijcai.org

[PDF][PDF] MAM-RNN: Multi-level attention model based RNN for video captioning.

X Li, B Zhao, X Lu - IJCAI, 2017 - ijcai.org

Visual information is quite important for the task of video captioning. However, in the video,
there are a lot of uncorrelated content, which may cause interference to generate a correct …

被引用次数：113 相关文章所有 4 个版本

Cross-modal video moment retrieval with spatial and language-temporal attention

B Jiang, X Huang, C Yang, J Yuan - Proceedings of the 2019 on …, 2019 - dl.acm.org

Given an untrimmed video and a description query, temporal moment retrieval aims to
localize the temporal segment within the video that best describes the textual query. Existing …

被引用次数：88 相关文章

高级搜索

QQ 群