Counterfactual visual dialog: Robust commonsense knowledge learning from unbiased training

AA Liu, C Huang, N Xu, H Tian, J Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Visual Dialog (VD) requires an agent to answer the current question by engaging in a
conversation with humans referring to an image. Despite the recent progress, it is beneficial …

Selective arguments representation with dual relation-aware network for video situation recognition

W Liu, Q He, C Wang, Y Peng, S Xie - Neural Computing and Applications, 2024 - Springer
Argument visual states are helpful for detecting structured components of events in videos,
and existing methods tend to use object detectors to generate their candidates. However …

Hierarchical Synergy-Enhanced Multimodal Relational Network for Video Question Answering

M Peng, X Shao, Y Shi, X Zhou - ACM Transactions on Multimedia …, 2023 - dl.acm.org
Video question answering (VideoQA) is challenging as it requires reasoning about natural
language and multimodal interactive relations. Most existing methods apply attention …

HSDreport: Heart Sound Diagnosis with Echocardiography Reports

Z Zhao, P Wang, L Zhao, Y Yang, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Heart sound auscultation holds significant importance in the diagnosis of congenital heart
disease. However, existing methods for Heart Sound Diagnosis (HSD) tasks are …