Video description is the automatic generation of natural language sentences that describe the contents of a given video. It has applications in human-robot interaction, helping the …
J Xu, T Mei, T Yao, Y Rui - Proceedings of the IEEE …, 2016 - openaccess.thecvf.com
While there has been increasing interest in the task of describing video with natural language, current computer vision algorithms are still severely limited in terms of the …
A Karpathy, L Fei-Fei - Proceedings of the IEEE conference on …, 2015 - cv-foundation.org
We present a model that generates natural language descriptions of images and their regions. Our approach leverages datasets of images and their sentence descriptions to …
Abstract Models comprised of deep convolutional network layers have dominated recent image interpretation tasks; we investigate whether models which are also compositional, or" …
We introduce the MovieQA dataset which aims to evaluate automatic story comprehension from both video and text. The dataset consists of 14,944 questions about 408 movies with …
Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images …
Automatic generation of video captions is a fundamental challenge in computer vision. Recent techniques typically employ a combination of Convolutional Neural Networks …
S Kazemzadeh, V Ordonez, M Matten… - Proceedings of the 2014 …, 2014 - aclanthology.org
In this paper we introduce a new game to crowd-source natural language referring expressions. By designing a two player game, we can both collect and verify referring …
This thesis develops a method for automatically constructing, visualizing and describing a large class of models, useful for forecasting and finding structure in domains such as time …