A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching

P Das, C Xu, RF Doell, JJ Corso - Proceedings of the IEEE …, 2013 - openaccess.thecvf.com
The problem of describing images through natural language has gained importance in the
computer vision community. Solutions to image description have either focused on a top …

Translating video content to natural language descriptions

M Rohrbach, W Qiu, I Titov, S Thater… - Proceedings of the …, 2013 - openaccess.thecvf.com
Humans use rich natural language to describe and communicate visual perceptions. In
order to provide natural language descriptions for visual content, this paper combines two …

Msr-vtt: A large video description dataset for bridging video and language

J Xu, T Mei, T Yao, Y Rui - Proceedings of the IEEE …, 2016 - openaccess.thecvf.com
While there has been increasing interest in the task of describing video with natural
language, current computer vision algorithms are still severely limited in terms of the …

Dense captioning with joint inference and visual context

L Yang, K Tang, J Yang, LJ Li - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
Dense captioning is a newly emerging computer vision topic for understanding images with
dense language descriptions. The goal is to densely detect visual concepts (eg, objects …

From captions to visual concepts and back

H Fang, S Gupta, F Iandola… - Proceedings of the …, 2015 - openaccess.thecvf.com
This paper presents a novel approach for automatically generating image descriptions:
visual detectors, language models, and multimodal similarity models learnt directly from a …

Image generation from scene graphs

J Johnson, A Gupta, L Fei-Fei - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com
To truly understand the visual world our models should be able not only to recognize images
but also generate them. To this end, there has been exciting recent progress on gen-erating …

Grounded video description

L Zhou, Y Kalantidis, X Chen… - Proceedings of the …, 2019 - openaccess.thecvf.com
Video description is one of the most challenging problems in vision and language
understanding due to the large variability both on the video and language side. Models …

Describing videos by exploiting temporal structure

L Yao, A Torabi, K Cho, N Ballas… - Proceedings of the …, 2015 - openaccess.thecvf.com
Recent progress in using recurrent neural networks (RNNs) for image description has
motivated the exploration of their application for video description. However, while images …

Jointly localizing and describing events for dense video captioning

Y Li, T Yao, Y Pan, H Chao… - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com
Automatically describing a video with natural language is regarded as a fundamental
challenge in computer vision. The problem nevertheless is not trivial especially when a …

Deep visual-semantic alignments for generating image descriptions

A Karpathy, L Fei-Fei - Proceedings of the IEEE conference on …, 2015 - cv-foundation.org
We present a model that generates natural language descriptions of images and their
regions. Our approach leverages datasets of images and their sentence descriptions to …