相关文章- 学术资源搜索

From recognition to cognition: Visual commonsense reasoning

R Zellers, Y Bisk, A Farhadi… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Visual understanding goes well beyond object recognition. With one glance at an image, we
can effortlessly imagine the world beyond the pixels: for instance, we can infer people's …

被引用次数：872 相关文章所有 7 个版本

[PDF] arxiv.org

Broaden the vision: Geo-diverse visual commonsense reasoning

D Yin, LH Li, Z Hu, N Peng, KW Chang - arXiv preprint arXiv:2109.06860, 2021 - arxiv.org

Commonsense is defined as the knowledge that is shared by everyone. However, certain
types of commonsense knowledge are correlated with culture and geographic locations and …

被引用次数：37 相关文章所有 5 个版本

[PDF] arxiv.org

Visualcomet: Reasoning about the dynamic context of a still image

JS Park, C Bhagavatula, R Mottaghi, A Farhadi… - Computer Vision–ECCV …, 2020 - Springer

Even from a single frame of a still image, people can reason about the dynamic story of the
image before, after, and beyond the frame. For example, given an image of a man struggling …

被引用次数：120 相关文章所有 5 个版本

[PDF] thecvf.com

Breaking common sense: Whoops! a vision-and-language benchmark of synthetic and compositional images

N Bitton-Guetta, Y Bitton, J Hessel… - Proceedings of the …, 2023 - openaccess.thecvf.com

Weird, unusual, and uncanny images pique the curiosity of observers because they
challenge commonsense. For example, an image released during the 2022 world cup …

被引用次数：32 相关文章所有 6 个版本

[PDF] arxiv.org

CAT: A contextualized conceptualization and instantiation framework for commonsense reasoning

W Wang, T Fang, B Xu, CYL Bo, Y Song… - arXiv preprint arXiv …, 2023 - arxiv.org

Commonsense reasoning, aiming at endowing machines with a human-like ability to make
situational presumptions, is extremely challenging to generalize. For someone who barely …

被引用次数：22 相关文章所有 5 个版本

[PDF] arxiv.org

Compositional attention networks for machine reasoning

DA Hudson, CD Manning - arXiv preprint arXiv:1803.03067, 2018 - arxiv.org

We present the MAC network, a novel fully differentiable neural network architecture,
designed to facilitate explicit and expressive reasoning. MAC moves away from monolithic …

被引用次数：609 相关文章所有 8 个版本

[PDF] thecvf.com

Raven: A dataset for relational and analogical visual reasoning

C Zhang, F Gao, B Jia, Y Zhu… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Dramatic progress has been witnessed in basic vision tasks involving low-level perception,
such as object recognition, detection, and tracking. Unfortunately, there is still enormous …

被引用次数：269 相关文章所有 13 个版本

[PDF] thecvf.com

Explainable and explicit visual reasoning over scene graphs

J Shi, H Zhang, J Li - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com

We aim to dismantle the prevalent black-box neural architectures used in complex visual
reasoning tasks, into the proposed eXplainable and eXplicit Neural Modules (XNMs), which …

被引用次数：236 相关文章所有 8 个版本

[PDF] neurips.cc

Merlot: Multimodal neural script knowledge models

R Zellers, X Lu, J Hessel, Y Yu… - Advances in neural …, 2021 - proceedings.neurips.cc

As humans, we understand events in the visual world contextually, performing multimodal
reasoning across time to make inferences about the past, present, and future. We introduce …

被引用次数：348 相关文章所有 7 个版本

[PDF] thecvf.com

Merlot reserve: Neural script knowledge through vision and language and sound

R Zellers, J Lu, X Lu, Y Yu, Y Zhao… - Proceedings of the …, 2022 - openaccess.thecvf.com

As humans, we navigate a multimodal world, building a holistic understanding from all our
senses. We introduce MERLOT Reserve, a model that represents videos jointly over time …

被引用次数：213 相关文章所有 9 个版本

高级搜索

QQ 群

From recognition to cognition: Visual commonsense reasoning

Broaden the vision: Geo-diverse visual commonsense reasoning

Visualcomet: Reasoning about the dynamic context of a still image

Breaking common sense: Whoops! a vision-and-language benchmark of synthetic and compositional images

CAT: A contextualized conceptualization and instantiation framework for commonsense reasoning

Compositional attention networks for machine reasoning

Raven: A dataset for relational and analogical visual reasoning

Explainable and explicit visual reasoning over scene graphs

Merlot: Multimodal neural script knowledge models

Merlot reserve: Neural script knowledge through vision and language and sound

相关搜索

引用