A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

Automatic chart understanding: a review

AM Farahani, P Adibi, MS Ehsani, HP Hutter… - IEEE …, 2023 - ieeexplore.ieee.org
Automated chart analysis has vast potential to improve the accessibility of charts for a wider
audience, eg, people with visual impairments or other disabilities, by generating captions for …

Bottom-up and top-down attention for image captioning and visual question answering

P Anderson, X He, C Buehler… - Proceedings of the …, 2018 - openaccess.thecvf.com
Top-down visual attention mechanisms have been used extensively in image captioning
and visual question answering (VQA) to enable deeper image understanding through fine …

Semantic compositional networks for visual captioning

Z Gan, C Gan, X He, Y Pu, K Tran… - Proceedings of the …, 2017 - openaccess.thecvf.com
Abstract A Semantic Compositional Network (SCN) is developed for image captioning, in
which semantic concepts (ie, tags) are detected from the image, and the probability of each …

Video summarization with long short-term memory

K Zhang, WL Chao, F Sha, K Grauman - … 14, 2016, Proceedings, Part VII 14, 2016 - Springer
We propose a novel supervised learning technique for summarizing videos by automatically
selecting keyframes or key subshots. Casting the task as a structured prediction problem …

Grounding of textual phrases in images by reconstruction

A Rohrbach, M Rohrbach, R Hu, T Darrell… - Computer Vision–ECCV …, 2016 - Springer
Grounding (ie localizing) arbitrary, free-form textual phrases in visual content is a
challenging problem with many applications for human-computer interaction and image-text …

Image captioning and visual question answering based on attributes and external knowledge

Q Wu, C Shen, P Wang, A Dick… - IEEE transactions on …, 2017 - ieeexplore.ieee.org
Much of the recent progress in Vision-to-Language problems has been achieved through a
combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks …

What value do explicit high level concepts have in vision to language problems?

Q Wu, C Shen, L Liu, A Dick… - Proceedings of the …, 2016 - cv-foundation.org
Much recent progress in Vision-to-Language (V2L) problems has been achieved through a
combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks …

Abc-cnn: An attention based convolutional neural network for visual question answering

K Chen, J Wang, LC Chen, H Gao, W Xu… - arXiv preprint arXiv …, 2015 - arxiv.org
We propose a novel attention based deep learning architecture for visual question
answering task (VQA). Given an image and an image related natural language question …

Know more say less: Image captioning based on scene graphs

X Li, S Jiang - IEEE Transactions on Multimedia, 2019 - ieeexplore.ieee.org
Automatically describing the content of an image has been attracting considerable research
attention in the multimedia field. To represent the content of an image, many approaches …