Despite the success of various text generation metrics such as BERTScore, it is still difficult to evaluate the image captions without enough reference captions due to the diversity of the …
Y Wada, K Kaneda, D Saito… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Establishing an automatic evaluation metric that closely aligns with human judgments is essential for effectively developing image captioning models. Recent data-driven metrics …
J Feinglass, Y Yang - arXiv preprint arXiv:2106.01444, 2021 - arxiv.org
The open-ended nature of visual captioning makes it a challenging area for evaluation. The majority of proposed models rely on specialized training to improve human-correlation …
If you ask a human to describe an image, they might do so in a thousand different ways. Traditionally, image captioning models are trained to generate a single" best"(most like a …
E Salin, S Ayache, B Favre - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Vision-language foundation models have had considerable increase in performances in the last few years. However, there is still a lack of comprehensive evaluation methods able to …
LD Narins, A Scott, A Gautam… - Advances in …, 2024 - proceedings.neurips.cc
We present a new high-quality validated image caption rating (VICR) dataset. How well a caption fits an image can be difficult to assess due to the subjective nature of caption quality …
Automatically describing images using natural sentences is essential to visually impaired people's inclusion on the Internet. This problem is known as Image Captioning. There are …
Cross-modal language generation tasks such as image captioning are directly hurt in their ability to support non-English languages by the trend of data-hungry models combined with …