C Yan, Y Tu, X Wang, Y Zhang, X Hao… - IEEE transactions on …, 2019 - ieeexplore.ieee.org
Video captioning refers to automatic generate natural language sentences, which summarize the video contents. Inspired by the visual attention mechanism of human beings …
L Gao, X Li, J Song, HT Shen - IEEE transactions on pattern …, 2019 - ieeexplore.ieee.org
Recent progress has been made in using attention based encoder-decoder framework for image and video captioning. Most existing decoders apply the attention mechanism to every …
Y Pan, T Yao, H Li, T Mei - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
Automatically generating natural language descriptions of videos plays a fundamental challenge for computer vision community. Most recent progress in this problem has been …
Audio description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design …
We introduce the task of scene-aware dialog. Our goal is to generate a complete and natural response to a question about a scene, given video and audio of the scene and the history of …
B Zhao, X Li, X Lu - IEEE Transactions on Image Processing, 2019 - ieeexplore.ieee.org
Video captioning is a technique that bridges vision and language together, for which both visual information and text information are quite important. Typical approaches are based on …
L Dai, R Fang, H Li, X Hou, B Sheng… - IEEE transactions on …, 2018 - ieeexplore.ieee.org
Notice of Violation of IEEE Publication Principles" Clinical Report Guided Retinal Microaneurysm Detection With Multi-Sieving Deep Learning," by Ling Dai, Ruogu Fang …
Visual information is quite important for the task of video captioning. However, in the video, there are a lot of uncorrelated content, which may cause interference to generate a correct …
B Jiang, X Huang, C Yang, J Yuan - Proceedings of the 2019 on …, 2019 - dl.acm.org
Given an untrimmed video and a description query, temporal moment retrieval aims to localize the temporal segment within the video that best describes the textual query. Existing …