Vistext: A benchmark for semantically rich chart captioning

BJ Tang, A Boggust, A Satyanarayan - arXiv preprint arXiv:2307.05356, 2023 - arxiv.org
Captions that describe or explain charts help improve recall and comprehension of the
depicted data and provide a more accessible medium for people with visual disabilities …

Generating accurate caption units for figure captioning

X Qian, E Koh, F Du, S Kim, J Chan, RA Rossi… - Proceedings of the Web …, 2021 - dl.acm.org
Scientific-style figures are commonly used on the web to present numerical information.
Captions that tell accurate figure information and sound natural would significantly improve …

Do lvlms understand charts? analyzing and correcting factual errors in chart captioning

KH Huang, M Zhou, HP Chan, YR Fung… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent advancements in large vision-language models (LVLMs) have led to significant
progress in generating natural language descriptions for visual content and thus enhancing …

Clipcap: Clip prefix for image captioning

R Mokady, A Hertz, AH Bermano - arXiv preprint arXiv:2111.09734, 2021 - arxiv.org
Image captioning is a fundamental task in vision-language understanding, where the model
predicts a textual informative caption to a given input image. In this paper, we present a …

A picture is worth more than 77 text tokens: Evaluating clip-style models on dense captions

J Urbanek, F Bordes, P Astolfi… - Proceedings of the …, 2024 - openaccess.thecvf.com
Curation methods for massive vision-language datasets trade off between dataset size and
quality. However even the highest quality of available curated captions are far too short to …

Chart-to-text: Generating natural language descriptions for charts by adapting the transformer model

J Obeid, E Hoque - arXiv preprint arXiv:2010.09142, 2020 - arxiv.org
Information visualizations such as bar charts and line charts are very popular for exploring
data and communicating insights. Interpreting and making sense of such visualizations can …

Audiocaps: Generating captions for audios in the wild

CD Kim, B Kim, H Lee, G Kim - … of the 2019 Conference of the …, 2019 - aclanthology.org
We explore the problem of Audio Captioning: generating natural language description for
any kind of audio in the wild, which has been surprisingly unexplored in previous research …

Guiding image captioning models toward more specific captions

S Kornblith, L Li, Z Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Image captioning is conventionally formulated as the task of generating captions that match
the conditional distribution of reference image-caption pairs. However, reference captions in …

Chart-to-text: A large-scale benchmark for chart summarization

S Kantharaj, RTK Leong, X Lin, A Masry… - arXiv preprint arXiv …, 2022 - arxiv.org
Charts are commonly used for exploring data and communicating insights. Generating
natural language summaries from charts can be very helpful for people in inferring key …

Clipscore: A reference-free evaluation metric for image captioning

J Hessel, A Holtzman, M Forbes, RL Bras… - arXiv preprint arXiv …, 2021 - arxiv.org
Image captioning has conventionally relied on reference-based automatic evaluations,
where machine captions are compared against captions written by humans. This is in …