Answering, Fast and Slow: Strategy enhancement of visual understanding guided by causality

C Wang, Z Wang, Y Zhou - Neurocomputing, 2025 - Elsevier
In his classic book Thinking, Fast and Slow (Daniel, 2017), Kahneman points out that human
thinking can be categorized into two main modes of thinking: a system that displays intuition …

Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor

J Chen, X Hei, Y Xue, Y Wei, J Xie, Y Cai… - Proceedings of the 32nd …, 2024 - dl.acm.org
Large multimodal models (LMMs) have shown remarkable performance in the visual
commonsense reasoning (VCR) task, which aims to answer a multiple-choice question …

Exploring the Answering Capability of Large Language Models in Addressing Complex Knowledge in Entrepreneurship Education

Q Lang, S Tian, M Wang, J Wang - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Entrepreneurship education is critical in encouraging students' innovation, creativity, and
entrepreneurial spirit. It provides essential skills and knowledge, enabling them to open their …

FGLNet: frequency global and local context channel attention networks

Y Liu, Y Liu, H Li, J Zhang - Applied Intelligence, 2024 - Springer
The application of attention mechanisms, especially channel attention, has achieved huge
success in the field of computer vision. However, existing methods mainly focus on more …

Multimodal Relational Triple Extraction with Query-based Entity Object Transformer

L Hei, N An, T Liao, Q Ma, J Wang, F Ren - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal Relation Extraction is crucial for constructing flexible and realistic knowledge
graphs. Recent studies focus on extracting the relation type with entity pairs present in …