Revisiting Counterfactual Problems in Referring Expression Comprehension

Z Yu, R Li - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Traditional referring expression comprehension (REC) aims to locate the target referent in
an image guided by a text query. Several previous methods have studied on the …

A Masked Reference Token Supervision based Iterative Visual-language Framework for Robust Visual Grounding

C Wang, W Feng, S Lyu, G Cheng, X Li… - … on Circuits and …, 2024 - ieeexplore.ieee.org
Visual Grounding (VG) has become a prominent task in recent years, achieving significant
advancements with the development of detection and vision transformers. However, existing …

ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding

M Zheng, J Zhang, Q Chen, Y Peng, Y Liu - arXiv preprint arXiv …, 2024 - arxiv.org
Visual grounding aims to localize the object referred to in an image based on a natural
language query. Although progress has been made recently, accurately localizing target …

MFSC: A Multimodal Aspect-Level Sentiment Classification Framework with Multi-Image Gate and Fusion Networks

L Zi, X Pan, X Cong - Electronics, 2024 - mdpi.com
Currently, there is a great deal of interest in multimodal aspect-level sentiment classification
using both textual and visual information, which changes the traditional use of only single …