Overcoming language priors in visual question answering with cumulative learning strategy

A Mao, F Chen, Z Ma, K Lin - Neurocomputing, 2024 - Elsevier
The performance of visual question answering (VQA) has witnessed great progress over the
last few years. However, many current VQA models tend to rely on superficial linguistic …

Diabetic retinopathy detection and severity classification using optimized deep learning with explainable AI technique

B Lalithadevi, S Krishnaveni - Multimedia Tools and Applications, 2024 - Springer
Diabetic Retinopathy (DR) is a serious consequence of prolonged diabetic patients that
causes vision threatening issues and irreversible blindness problem. In the beginning stage …

Exploring the State-of-the-Art in Multi-Object Tracking: A Comprehensive Survey, Evaluation, Challenges, and Future Directions

C Du, C Lin, R Jin, B Chai, Y Yao, S Su - Multimedia Tools and …, 2024 - Springer
Multiple object tracking (MOT), as a typical application scenario of computer vision, has
attracted significant attention from both academic and industrial communities. With its rapid …

Relation-aware Multi-pass Comparison Deconfounded Network for Change Captioning

Z Lu, L Jin, Z Chen, C Tian, X Sun, X Li… - … on Circuits and …, 2024 - ieeexplore.ieee.org
Change captioning aims to describe the semantic change between a pair of images with
natural language while remaining immune to viewpoint change. Based on the encoder …

A simple transformer-based baseline for crowd tracking with Sequential Feature Aggregation and Hybrid Group Training

C Wang, Z Wu, W Ke, Z Xiong - Journal of Visual Communication and …, 2024 - Elsevier
Tracking pedestrians in crowded scenes is a challenging task. Existing transformer-based
tracking methods integrate detection and tracking into a unified model, which simplifies the …

Adept: Annotation-denoising auxiliary tasks with discrete cosine transform map and keypoint for human-centric pretraining

W He, Y Yan, S Tang, Y Deng, Y Zhong, P Luo, D Qi - Neurocomputing, 2025 - Elsevier
Human-centric perception is the core of diverse computer vision tasks and has been a long-
standing research focus. However, previous research studied these human-centric tasks …

[HTML][HTML] ALDII: Adaptive Learning-based Document Image Inpainting to enhance the handwritten Chinese character legibility of human and machine

Q Mao, J Li, H Zhou, P Kar, AG Bellotti - Neurocomputing, 2025 - Elsevier
Abstract Document Image Inpainting (DII) has been applied to degraded documents,
including financial and historical documents, to enhance the legibility of images for:(1) …

Quantum convolution neural network for multi-nutrient detection and stress identification in plant leaves

K Venkatesh, KJ Naik, A Shankar - Multimedia Tools and Applications, 2024 - Springer
Nutrient stress can impose significant metabolic strain on plants, resulting in declining
agricultural productivity. Nitrogen, phosphorus, and potassium are essential growth-limiting …

Runner re-identification from single-view running video in the open-world setting

T Suzuki, K Tsutsui, K Takeda, K Fujii - Multimedia Tools and Applications, 2024 - Springer
In many sports, player re-identification is crucial for automatic video processing and
analysis. However, most of the current studies on player re-identification in multi-or single …

Differential motion attention network for efficient action recognition

C Liu, F Gu - The Visual Computer, 2024 - Springer
Despite the great progresses achieved by commonly-used 3D CNNs and two-stream
methods in action recognition, they cause heavy computational burden which are inefficient …