From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

Accessible visualization: Design space, opportunities, and challenges

NW Kim, SC Joyner, A Riegelhuth… - Computer graphics …, 2021 - Wiley Online Library
Visualizations are now widely used across disciplines to understand and communicate data.
The benefit of visualizations lies in leveraging our natural visual perception. However, the …

Ai-generated content (aigc): A survey

J Wu, W Gan, Z Chen, S Wan, H Lin - arXiv preprint arXiv:2304.06632, 2023 - arxiv.org
To address the challenges of digital intelligence in the digital economy, artificial intelligence-
generated content (AIGC) has emerged. AIGC uses artificial intelligence to assist or replace …

Vizwiz grand challenge: Answering visual questions from blind people

D Gurari, Q Li, AJ Stangl, A Guo, C Lin… - Proceedings of the …, 2018 - openaccess.thecvf.com
The study of algorithms to automatically answer visual questions currently is motivated by
visual question answering (VQA) datasets constructed in artificial VQA settings. We propose …

Region-aware image captioning via interaction learning

AA Liu, Y Zhai, N Xu, W Nie, W Li… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Image captioning is one of the primary goals in computer vision which aims to automatically
generate natural descriptions for images. Intuitively, human visual system can notice some …

Improved image captioning via policy gradient optimization of spider

S Liu, Z Zhu, N Ye, S Guadarrama… - Proceedings of the …, 2017 - openaccess.thecvf.com
Current image captioning methods are usually trained via maximum likelihood estimation.
However, the log-likelihood score of a caption does not correlate well with human …

Connecting vision and language with localized narratives

J Pont-Tuset, J Uijlings, S Changpinyo… - Computer Vision–ECCV …, 2020 - Springer
Abstract We propose Localized Narratives, a new form of multimodal image annotations
connecting vision and language. We ask annotators to describe an image with their voice …

Captioning images taken by people who are blind

D Gurari, Y Zhao, M Zhang, N Bhattacharya - Computer Vision–ECCV …, 2020 - Springer
While an important problem in the vision community is to design algorithms that can
automatically caption images, few publicly-available datasets for algorithm development …

Understanding the effect of out-of-distribution examples and interactive explanations on human-ai decision making

H Liu, V Lai, C Tan - Proceedings of the ACM on Human-Computer …, 2021 - dl.acm.org
Although AI holds promise for improving human decision making in societally critical
domains, it remains an open question how human-AI teams can reliably outperform AI alone …

Taxonomizing and measuring representational harms: A look at image tagging

J Katzman, A Wang, M Scheuerman… - Proceedings of the …, 2023 - ojs.aaai.org
In this paper, we examine computational approaches for measuring the" fairness" of image
tagging systems, finding that they cluster into five distinct categories, each with its own …