M Rohrbach, W Qiu, I Titov, S Thater… - Proceedings of the …, 2013 - openaccess.thecvf.com
Humans use rich natural language to describe and communicate visual perceptions. In order to provide natural language descriptions for visual content, this paper combines two …
J Xu, T Mei, T Yao, Y Rui - Proceedings of the IEEE …, 2016 - openaccess.thecvf.com
While there has been increasing interest in the task of describing video with natural language, current computer vision algorithms are still severely limited in terms of the …
Dense captioning is a newly emerging computer vision topic for understanding images with dense language descriptions. The goal is to densely detect visual concepts (eg, objects …
This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a …
To truly understand the visual world our models should be able not only to recognize images but also generate them. To this end, there has been exciting recent progress on gen-erating …
Video description is one of the most challenging problems in vision and language understanding due to the large variability both on the video and language side. Models …
Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images …
Y Li, T Yao, Y Pan, H Chao… - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com
Automatically describing a video with natural language is regarded as a fundamental challenge in computer vision. The problem nevertheless is not trivial especially when a …
A Karpathy, L Fei-Fei - Proceedings of the IEEE conference on …, 2015 - cv-foundation.org
We present a model that generates natural language descriptions of images and their regions. Our approach leverages datasets of images and their sentence descriptions to …