From recognition to cognition: Visual commonsense reasoning

R Zellers, Y Bisk, A Farhadi… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Visual understanding goes well beyond object recognition. With one glance at an image, we
can effortlessly imagine the world beyond the pixels: for instance, we can infer people's …

Broaden the vision: Geo-diverse visual commonsense reasoning

D Yin, LH Li, Z Hu, N Peng, KW Chang - arXiv preprint arXiv:2109.06860, 2021 - arxiv.org
Commonsense is defined as the knowledge that is shared by everyone. However, certain
types of commonsense knowledge are correlated with culture and geographic locations and …

Visualcomet: Reasoning about the dynamic context of a still image

JS Park, C Bhagavatula, R Mottaghi, A Farhadi… - Computer Vision–ECCV …, 2020 - Springer
Even from a single frame of a still image, people can reason about the dynamic story of the
image before, after, and beyond the frame. For example, given an image of a man struggling …

Breaking common sense: Whoops! a vision-and-language benchmark of synthetic and compositional images

N Bitton-Guetta, Y Bitton, J Hessel… - Proceedings of the …, 2023 - openaccess.thecvf.com
Weird, unusual, and uncanny images pique the curiosity of observers because they
challenge commonsense. For example, an image released during the 2022 world cup …

CAT: A contextualized conceptualization and instantiation framework for commonsense reasoning

W Wang, T Fang, B Xu, CYL Bo, Y Song… - arXiv preprint arXiv …, 2023 - arxiv.org
Commonsense reasoning, aiming at endowing machines with a human-like ability to make
situational presumptions, is extremely challenging to generalize. For someone who barely …

Compositional attention networks for machine reasoning

DA Hudson, CD Manning - arXiv preprint arXiv:1803.03067, 2018 - arxiv.org
We present the MAC network, a novel fully differentiable neural network architecture,
designed to facilitate explicit and expressive reasoning. MAC moves away from monolithic …

Raven: A dataset for relational and analogical visual reasoning

C Zhang, F Gao, B Jia, Y Zhu… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Dramatic progress has been witnessed in basic vision tasks involving low-level perception,
such as object recognition, detection, and tracking. Unfortunately, there is still enormous …

Explainable and explicit visual reasoning over scene graphs

J Shi, H Zhang, J Li - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com
We aim to dismantle the prevalent black-box neural architectures used in complex visual
reasoning tasks, into the proposed eXplainable and eXplicit Neural Modules (XNMs), which …

Merlot: Multimodal neural script knowledge models

R Zellers, X Lu, J Hessel, Y Yu… - Advances in neural …, 2021 - proceedings.neurips.cc
As humans, we understand events in the visual world contextually, performing multimodal
reasoning across time to make inferences about the past, present, and future. We introduce …

Merlot reserve: Neural script knowledge through vision and language and sound

R Zellers, J Lu, X Lu, Y Yu, Y Zhao… - Proceedings of the …, 2022 - openaccess.thecvf.com
As humans, we navigate a multimodal world, building a holistic understanding from all our
senses. We introduce MERLOT Reserve, a model that represents videos jointly over time …