LRCN: Layer-residual Co-Attention Networks for visual question answering

D Han, J Shi, J Zhao, H Wu, Y Zhou, LH Li… - Expert Systems with …, 2025 - Elsevier
Abstract Visual Question Answering (VQA) is a multimodal task requiring a collaborative
understanding of fine-grained visual concepts and language semantics. The key to VQA …

KTMN: Knowledge-driven Two-stage Modulation Network for visual question answering

J Shi, D Han, C Chen, X Shen - Multimedia Systems, 2024 - Springer
Existing visual question answering (VQA) methods introduce the Transformer as the
backbone architecture for intra-and inter-modal interactions, demonstrating its effectiveness …

SAFFNet: self-attention based on Fourier frequency domain filter network for visual question answering

J Shi, D Han, C Chen, X Shen - The Visual Computer, 2025 - Springer
The focus point of visual question answering (VQA) tasks is how to reasonably and
effectively model the relationship between complex information like images. The mainstream …

A lightweight Transformer-based visual question answering network with Weight-Sharing Hybrid Attention

Y Zhu, D Chen, T Jia, S Deng - Neurocomputing, 2024 - Elsevier
Recent advances show that Transformer-based models and object detection-based models
play an indispensable role in VQA. However, object detection-based models have …

Enhancing few-shot KB-VQA with panoramic image captions guided by Large Language Models

P Qiang, H Tan, X Li, D Wang, R Li, X Sun, H Zhang… - Neurocomputing, 2025 - Elsevier
Current state-of-the-art (SOTA) KB-VQA techniques involve transforming images into image
captions as prompts to harness the potent reasoning capabilities of large language models …

Heterogeneous-branch integration framework: Introducing first-order predicate logic in Logical Reasoning Question Answering

J Yue, X Bi, Z Chen - Neurocomputing, 2024 - Elsevier
The logical reasoning question-answering is a critical task in natural language processing,
as it equips models with human-like logical reasoning intelligence. Existing approaches …

Improving Time Series Regression Model Accuracy via Systematic Training Dataset Augmentation and Sampling

R Ströbel, M Mau, A Puchta, J Fleischer - Machine Learning and …, 2024 - mdpi.com
This study addresses a significant gap in the field of time series regression modeling by
highlighting the central role of data augmentation in improving model accuracy. The primary …

Hierarchical synchronization with structured multi-granularity interaction for video question answering

S Qi, L Yang, C Li - Neurocomputing, 2024 - Elsevier
Abstract Video Question Answering (VideoQA) requires a thorough comprehension of
linguistic and visual modalities. However, recent methods confront two problems:(1) …

Adaptify: A Refined Adaptation Scheme for Frame Classification in Atrophic Gastritis Videos

Z Xiong, S Chen, Y Zhang, Y Cao, B Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Atrophic gastritis is a significant risk factor for developing gastric cancer. The incorporation of
machine learning algorithms can efficiently elevate the possibility of accurately detecting …

Adaptify: A Refined Test-Time Adaptation Scheme for Frame Classification Consistency in Atrophic Gastritis Videos

Z Xiong, S Chen, Y Zhang, Y Cao… - … on Biomedical Imaging …, 2024 - ieeexplore.ieee.org
Atrophic gastritis is a significant risk factor for developing gastric cancer. The incorporation of
machine learning algorithms can efficiently elevate the possibility of accurately detecting …