A review of modularization techniques in artificial neural networks

M Amer, T Maul - Artificial Intelligence Review, 2019 - Springer
Artificial neural networks (ANNs) have achieved significant success in tackling classical and
modern machine learning problems. As learning problems grow in scale and complexity …

STAT: Spatial-temporal attention mechanism for video captioning

C Yan, Y Tu, X Wang, Y Zhang, X Hao… - IEEE transactions on …, 2019 - ieeexplore.ieee.org
Video captioning refers to automatic generate natural language sentences, which
summarize the video contents. Inspired by the visual attention mechanism of human beings …

Hierarchical LSTMs with adaptive attention for visual captioning

L Gao, X Li, J Song, HT Shen - IEEE transactions on pattern …, 2019 - ieeexplore.ieee.org
Recent progress has been made in using attention based encoder-decoder framework for
image and video captioning. Most existing decoders apply the attention mechanism to every …

Video captioning with transferred semantic attributes

Y Pan, T Yao, H Li, T Mei - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
Automatically generating natural language descriptions of videos plays a fundamental
challenge for computer vision community. Most recent progress in this problem has been …

Movie description

A Rohrbach, A Torabi, M Rohrbach, N Tandon… - International Journal of …, 2017 - Springer
Audio description (AD) provides linguistic descriptions of movies and allows visually
impaired people to follow a movie along with their peers. Such descriptions are by design …

Audio visual scene-aware dialog

H Alamri, V Cartillier, A Das, J Wang… - Proceedings of the …, 2019 - openaccess.thecvf.com
We introduce the task of scene-aware dialog. Our goal is to generate a complete and natural
response to a question about a scene, given video and audio of the scene and the history of …

CAM-RNN: Co-attention model based RNN for video captioning

B Zhao, X Li, X Lu - IEEE Transactions on Image Processing, 2019 - ieeexplore.ieee.org
Video captioning is a technique that bridges vision and language together, for which both
visual information and text information are quite important. Typical approaches are based on …

Clinical report guided retinal microaneurysm detection with multi-sieving deep learning

L Dai, R Fang, H Li, X Hou, B Sheng… - IEEE transactions on …, 2018 - ieeexplore.ieee.org
Notice of Violation of IEEE Publication Principles" Clinical Report Guided Retinal
Microaneurysm Detection With Multi-Sieving Deep Learning," by Ling Dai, Ruogu Fang …

[PDF][PDF] MAM-RNN: Multi-level attention model based RNN for video captioning.

X Li, B Zhao, X Lu - IJCAI, 2017 - ijcai.org
Visual information is quite important for the task of video captioning. However, in the video,
there are a lot of uncorrelated content, which may cause interference to generate a correct …

Cross-modal video moment retrieval with spatial and language-temporal attention

B Jiang, X Huang, C Yang, J Yuan - Proceedings of the 2019 on …, 2019 - dl.acm.org
Given an untrimmed video and a description query, temporal moment retrieval aims to
localize the temporal segment within the video that best describes the textual query. Existing …