Automated chart analysis has vast potential to improve the accessibility of charts for a wider audience, eg, people with visual impairments or other disabilities, by generating captions for …
P Anderson, X He, C Buehler… - Proceedings of the …, 2018 - openaccess.thecvf.com
Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine …
Abstract A Semantic Compositional Network (SCN) is developed for image captioning, in which semantic concepts (ie, tags) are detected from the image, and the probability of each …
We propose a novel supervised learning technique for summarizing videos by automatically selecting keyframes or key subshots. Casting the task as a structured prediction problem …
Grounding (ie localizing) arbitrary, free-form textual phrases in visual content is a challenging problem with many applications for human-computer interaction and image-text …
Much of the recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks …
Much recent progress in Vision-to-Language (V2L) problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks …
We propose a novel attention based deep learning architecture for visual question answering task (VQA). Given an image and an image related natural language question …
X Li, S Jiang - IEEE Transactions on Multimedia, 2019 - ieeexplore.ieee.org
Automatically describing the content of an image has been attracting considerable research attention in the multimedia field. To represent the content of an image, many approaches …