Multimodal graph networks for compositional generalization in visual question answering

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

被引用次数：40 相关文章

[PDF] arxiv.org

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arXiv preprint arXiv:2209.03430, 2022 - arxiv.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

被引用次数：143 相关文章所有 2 个版本

[PDF] neurips.cc

Large language models as commonsense knowledge for large-scale task planning

Z Zhao, WS Lee, D Hsu - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Large-scale task planning is a major challenge. Recent work exploits large language
models (LLMs) directly as a policy and shows surprisingly interesting results. This paper …

被引用次数：143 相关文章所有 7 个版本

[PDF] thecvf.com

Multi-level feature learning for contrastive multi-view clustering

J Xu, H Tang, Y Ren, L Peng… - Proceedings of the …, 2022 - openaccess.thecvf.com

Multi-view clustering can explore common semantics from multiple views and has attracted
increasing attention. However, existing works punish multiple objectives in the same feature …

被引用次数：227 相关文章所有 6 个版本

[PDF] arxiv.org

A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective

C Chen, Y Wu, Q Dai, HY Zhou, M Xu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …

被引用次数：55 相关文章所有 3 个版本

[PDF] neurips.cc

Debiased visual question answering from feature and sample perspectives

Z Wen, G Xu, M Tan, Q Wu… - Advances in Neural …, 2021 - proceedings.neurips.cc

Visual question answering (VQA) is designed to examine the visual-textual reasoning ability
of an intelligent agent. However, recent observations show that many VQA models may only …

被引用次数：71 相关文章所有 9 个版本

[PDF] arxiv.org

Modular visual question answering via code generation

S Subramanian, M Narasimhan… - arXiv preprint arXiv …, 2023 - arxiv.org

We present a framework that formulates visual question answering as modular code
generation. In contrast to prior work on modular approaches to VQA, our approach requires …

被引用次数：43 相关文章所有 7 个版本

[PDF] cityu.edu.hk

Multimodal graph learning based on 3D Haar semi-tight framelet for student engagement prediction

M Li, X Zhuang, L Bai, W Ding - Information Fusion, 2024 - Elsevier

With the increasing availability of multimodal educational data, there is a growing need to
effectively integrate and exploit multiple data sources to enhance student engagement …

被引用次数：25 相关文章所有 4 个版本

[PDF] arxiv.org

Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arXiv preprint arXiv …, 2024 - arxiv.org

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

被引用次数：30 相关文章所有 2 个版本

Test-time model adaptation for visual question answering with debiased self-supervisions

Z Wen, S Niu, G Li, Q Wu, M Tan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Visual question answering (VQA) is a prevalent task in real-world, and plays an essential
role in helping the blind understand the physical world. However, due to the real-world …

被引用次数：17 相关文章所有 2 个版本

高级搜索

QQ 群