Deep Learning and its applications have cascaded impactful research and development with a diverse range of modalities present in the real-world data. More recently, this has …
Visual Question Answering (VQA) has benefited from increasingly sophisticated models, but has not enjoyed the same level of engagement in terms of data creation. In this paper, we …
L Chen, Z Jiang, J Xiao, W Liu - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Abstract Controllable Image Captioning (CIC)--generating image descriptions following designated control signals--has received unprecedented attention over the last few years …
Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have …
Questions regarding implicitness, ambiguity and underspecification are crucial for understanding the task validity and ethical concerns of multimodal image+ text systems, yet …
EJ Hwang, V Shwartz - arXiv preprint arXiv:2305.13703, 2023 - arxiv.org
Memes are a widely popular tool for web users to express their thoughts using visual metaphors. Understanding memes requires recognizing and interpreting visual metaphors …
Research in massively multilingual image captioning has been severely hampered by a lack of high-quality evaluation datasets. In this paper we present the Crossmodal-3600 dataset …
SM Park, YG Kim - Artificial Intelligence Review, 2023 - Springer
With the recent development of deep learning, AI models are widely used in various domains. AI models show good performance for definite tasks such as image classification …
C Thomas, A Kovashka - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer
The abundance of multimodal data (eg social media posts) has inspired interest in cross- modal retrieval methods. Popular approaches rely on a variety of metric learning losses …