Scientific-style figures are commonly used on the web to present numerical information. Captions that tell accurate figure information and sound natural would significantly improve …
Recent advancements in large vision-language models (LVLMs) have led to significant progress in generating natural language descriptions for visual content and thus enhancing …
Image captioning is a fundamental task in vision-language understanding, where the model predicts a textual informative caption to a given input image. In this paper, we present a …
Curation methods for massive vision-language datasets trade off between dataset size and quality. However even the highest quality of available curated captions are far too short to …
Information visualizations such as bar charts and line charts are very popular for exploring data and communicating insights. Interpreting and making sense of such visualizations can …
CD Kim, B Kim, H Lee, G Kim - … of the 2019 Conference of the …, 2019 - aclanthology.org
We explore the problem of Audio Captioning: generating natural language description for any kind of audio in the wild, which has been surprisingly unexplored in previous research …
S Kornblith, L Li, Z Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Image captioning is conventionally formulated as the task of generating captions that match the conditional distribution of reference image-caption pairs. However, reference captions in …
S Kantharaj, RTK Leong, X Lin, A Masry… - arXiv preprint arXiv …, 2022 - arxiv.org
Charts are commonly used for exploring data and communicating insights. Generating natural language summaries from charts can be very helpful for people in inferring key …
Image captioning has conventionally relied on reference-based automatic evaluations, where machine captions are compared against captions written by humans. This is in …