Multiple relational learning network for joint referring expression comprehension and segmentation

G Hua, M Liao, S Tian, Y Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Multi-task learning is a successful learning framework which improves the performance of
prediction models by leveraging knowledge among related tasks. Referring expression …

Local-global coordination with transformers for referring image segmentation

F Liu, Y Kong, L Zhang, G Feng, B Yin - Neurocomputing, 2023 - Elsevier
Referring image segmentation has sprung up benefiting from the outstanding performance
of deep neural networks. However, most existing methods explore either local details or the …

Fully and weakly supervised referring expression segmentation with end-to-end learning

H Li, M Sun, J Xiao, EG Lim… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Referring Expression Segmentation (RES), which is aimed at localizing and segmenting the
target according to the given language expression, has drawn increasing attention. Existing …

Referring expression comprehension via enhanced cross-modal graph attention networks

J Wang, J Ke, HH Shuai, YH Li, WH Cheng - ACM Transactions on …, 2023 - dl.acm.org
Referring expression comprehension aims to localize a specific object in an image
according to a given language description. It is still challenging to comprehend and mitigate …

Dual-graph hierarchical interaction network for referring image segmentation

Z Shi, Q Wu, H Li, F Meng, KN Ngan - Displays, 2023 - Elsevier
Abstract Referring Image Segmentation (RIS) aims to extract the object or stuff from an
image according to the given natural language expression. As a representative multi-modal …

Cross-modal transformer with language query for referring image segmentation

W Zhang, Q Tan, P Li, Q Zhang, R Wang - Neurocomputing, 2023 - Elsevier
Referring image segmentation (RIS) aims to predict a segmentation mask for a target
specified by a natural language expression. However, the existing methods failed to …

Multi-Stage Image-Language Cross-Generative Fusion Network for Video-Based Referring Expression Comprehension

Y Zhang, Q Li, Y Pan, X Zhao… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Video-based referring expression comprehension is a challenging task that requires
locating the referred object in each video frame of a given video. While many existing …

Area-keywords cross-modal alignment for referring image segmentation

H Zhang, L Wang, S Li, K Xu, B Yin - Neurocomputing, 2024 - Elsevier
Referring image segmentation aims to segment the instance corresponding to the given
language description, which requires aligning information from two modalities. Existing …

Unpaired referring expression grounding via bidirectional cross-modal matching

H Shi, M Hayat, J Cai - Neurocomputing, 2023 - Elsevier
Referring expression grounding is an important and challenging task in computer vision. To
avoid the laborious annotation in conventional referring grounding, unpaired referring …

Exposing the Troublemakers in Described Object Detection

C Xie, Z Zhang, Y Wu, F Zhu, R Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org
Detecting objects based on language descriptions is a popular task that includes Open-
Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this …