Multi-step reasoning via recurrent dual attention for visual dialog

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023 - peerj.com

Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …

被引用次数：183 相关文章所有 8 个版本

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

被引用次数：163 相关文章所有 7 个版本

[PDF] mlr.press

Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation

J Li, D Li, C Xiong, S Hoi - International conference on …, 2022 - proceedings.mlr.press

Abstract Vision-Language Pre-training (VLP) has advanced the performance for many vision-
language tasks. However, most existing pre-trained models only excel in either …

被引用次数：3144 相关文章所有 5 个版本

[PDF] arxiv.org

Prompt programming for large language models: Beyond the few-shot paradigm

L Reynolds, K McDonell - Extended abstracts of the 2021 CHI …, 2021 - dl.acm.org

Prevailing methods for mapping large generative language models to supervised tasks may
fail to sufficiently probe models' novel capabilities. Using GPT-3 as a case study, we show …

被引用次数：716 相关文章所有 3 个版本

[PDF] arxiv.org

Recent advances in deep learning based dialogue systems: A systematic survey

J Ni, T Young, V Pandelea, F Xue… - Artificial intelligence review, 2023 - Springer

Dialogue systems are a popular natural language processing (NLP) task as it is promising in
real-life applications. It is also a complicated task since many NLP tasks deserving study are …

被引用次数：246 相关文章所有 15 个版本

[PDF] neurips.cc

Large-scale adversarial training for vision-and-language representation learning

Z Gan, YC Chen, L Li, C Zhu… - Advances in Neural …, 2020 - proceedings.neurips.cc

We present VILLA, the first known effort on large-scale adversarial training for vision-and-
language (V+ L) representation learning. VILLA consists of two training stages:(i) task …

被引用次数：513 相关文章所有 8 个版本

[PDF] researchgate.net

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer

In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

被引用次数：192 相关文章所有 8 个版本

[PDF] thecvf.com

Two causal principles for improving visual dialog

J Qi, Y Niu, J Huang, H Zhang - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

This paper unravels the design tricks adopted by us, the champion team MReaL-BDAI, for
Visual Dialog Challenge 2019: two causal principles for improving Visual Dialog (VisDial) …

被引用次数：165 相关文章所有 8 个版本

[PDF] thecvf.com

Utc: A unified transformer with inter-task contrastive learning for visual dialog

C Chen, Z Tan, Q Cheng, X Jiang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Visual Dialog aims to answer multi-round, interactive questions based on the dialog history
and image content. Existing methods either consider answer ranking and generating …

被引用次数：50 相关文章所有 5 个版本

[PDF] arxiv.org

Large-scale pretraining for visual dialog: A simple state-of-the-art baseline

V Murahari, D Batra, D Parikh, A Das - European Conference on Computer …, 2020 - Springer

Prior work in visual dialog has focused on training deep neural models on VisDial in
isolation. Instead, we present an approach to leverage pretraining on related vision …

被引用次数：127 相关文章所有 8 个版本

高级搜索

QQ 群