Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Breaking common sense: Whoops! a vision-and-language benchmark of synthetic and compositional images

N Bitton-Guetta, Y Bitton, J Hessel… - Proceedings of the …, 2023 - openaccess.thecvf.com
Weird, unusual, and uncanny images pique the curiosity of observers because they
challenge commonsense. For example, an image released during the 2022 world cup …

Visit-bench: A benchmark for vision-language instruction following inspired by real-world use

Y Bitton, H Bansal, J Hessel, R Shao, W Zhu… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce VisIT-Bench (Visual InsTruction Benchmark), a benchmark for evaluation of
instruction-following vision-language models for real-world use. Our starting point is curating …

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

S Zhong, Z Huang, S Gao, W Wen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Chain-of-Thought (CoT) guides large language models (LLMs) to reason step-by-
step and can motivate their logical reasoning ability. While effective for logical tasks CoT is …

FLUTE: Figurative language understanding through textual explanations

T Chakrabarty, A Saakyan, D Ghosh… - arXiv preprint arXiv …, 2022 - arxiv.org
Figurative language understanding has been recently framed as a recognizing textual
entailment (RTE) task (aka natural language inference, or NLI). However, similar to classical …

How well do large language models perform on faux pas tests?

N Shapira, G Zwirn, Y Goldberg - Findings of the Association for …, 2023 - aclanthology.org
Motivated by the question of the extent to which large language models “understand” social
intelligence, we investigate the ability of such models to generate correct responses to …

Memecap: A dataset for captioning and interpreting memes

EJ Hwang, V Shwartz - arXiv preprint arXiv:2305.13703, 2023 - arxiv.org
Memes are a widely popular tool for web users to express their thoughts using visual
metaphors. Understanding memes requires recognizing and interpreting visual metaphors …

WinoGAViL: Gamified association benchmark to challenge vision-and-language models

Y Bitton, N Bitton Guetta, R Yosef… - Advances in …, 2022 - proceedings.neurips.cc
While vision-and-language models perform well on tasks such as visual question
answering, they struggle when it comes to basic human commonsense reasoning skills. In …

MacGyver: Are Large Language Models Creative Problem Solvers?

Y Tian, A Ravichander, L Qin, RL Bras… - arXiv preprint arXiv …, 2023 - arxiv.org
We explore the creative problem-solving capabilities of modern large language models
(LLMs) in a constrained setting. The setting requires circumventing a cognitive bias known in …

Probing the Creativity of Large Language Models: Can models produce divergent semantic association?

H Chen, N Ding - arXiv preprint arXiv:2310.11158, 2023 - arxiv.org
Large language models possess remarkable capacity for processing language, but it
remains unclear whether these models can further generate creative content. The present …