A-okvqa: A benchmark for visual question answering using world knowledge

D Schwenk, A Khandelwal, C Clark, K Marino… - European conference on …, 2022 - Springer
Abstract The Visual Question Answering (VQA) task aspires to provide a meaningful testbed
for the development of AI models that can jointly reason over visual and natural language …

Textcaps: a dataset for image captioning with reading comprehension

O Sidorov, R Hu, M Rohrbach, A Singh - … 23–28, 2020, Proceedings, Part II …, 2020 - Springer
Image descriptions can help visually impaired people to quickly understand the image
content. While we made significant progress in automatically describing images and optical …

Fashionvil: Fashion-focused vision-and-language representation learning

X Han, L Yu, X Zhu, L Zhang, YZ Song… - European conference on …, 2022 - Springer
Abstract Large-scale Vision-and-Language (V+ L) pre-training for representation learning
has proven to be effective in boosting various downstream V+ L tasks. However, when it …

A fine-grained vision and language representation framework with graph-based fashion semantic knowledge

H Ding, S Wang, Z Xie, M Li, L Ma - Computers & Graphics, 2023 - Elsevier
Vision and language representation learning has been demonstrated to be an effective
means of enhancing multimodal task performance. However, fashion-specific studies have …

[HTML][HTML] A metamorphic testing approach for assessing question answering systems

K Tu, M Jiang, Z Ding - Mathematics, 2021 - mdpi.com
Question Answering (QA) enables the machine to understand and answer questions posed
in natural language, which has emerged as a powerful tool in various domains. However …

Improving Image Representations via MoCo Pre-training for Multimodal CXR Classification

F Dalla Serra, G Jacenków, F Deligianni… - Annual Conference on …, 2022 - Springer
Multimodal learning, here defined as learning from multiple input data types, has exciting
potential for healthcare. However, current techniques rely on large multimodal datasets …

[PDF][PDF] Amazon pars at memotion 2.0 2022: Multi-modal multi-task learning for memotion 2.0 challenge

GG Lee, M Shen - Proceedings http://ceur-ws. org ISSN, 2020 - ceur-ws.org
Over the years, memes became very popular as social media services growing rapidly.
Understanding meme images as humans do is very complicated because of its multi-modal …

Visual Question Answering for Response Synthesis Based on Spatial Actions

G Kiselev, D Weizenfeld, Y Gorbunova - International Conference on …, 2022 - Springer
The paper considers the automatic analysis problem of a user's natural language query from
an image. The mechanism synthesizes a logically correct non-binary response. Synthesis is …

Towards multilingual image captioning models that can read

R Gallardo García, B Beltrán Martínez… - … Conference on Artificial …, 2021 - Springer
Few current image captioning systems are capable to read and integrate read text into the
generated descriptions, none of them was developed to solve the problem from a bilingual …

Check for updates

N Yevtushenko¹, V Kuliamin¹… - Testing Software and …, 2019 - books.google.com
Homing, synchronizing and distinguishing sequences (HSs, SSs, and DSs) are used in FSM
(Finite State Machine) based testing for state identification and can significantly reduce the …