Fully transformer-equipped architecture for end-to-end referring video object segmentation

P Li, Y Zhang, L Yuan, X Xu - Information Processing & Management, 2024 - Elsevier
Abstract Referring Video Object Segmentation (RVOS) requires segmenting the object in
video referred by a natural language query. Existing methods mainly rely on sophisticated …

End-to-End Unsupervised Vision-and-Language Pre-training with Referring Expression Matching

C Chen, P Li, M Sun, Y Liu - … of the 2022 Conference on Empirical …, 2022 - aclanthology.org
Recently there has been an emerging interest in unsupervised vision-and-language pre-
training (VLP) that learns multimodal representations without parallel image-caption data …

Manufacturing domain instruction comprehension using synthetic data

K Johari, CTZ Tong, R Bhardwaj, V Subbaraju… - The Visual …, 2024 - Springer
Referring expression comprehension (REC) system solves a task to localize objects in a
given image, based on natural language expression. We propose a novel approach to …

[引用][C] 학습데이터증강을통한지칭표현세그멘테이션학습

설한울, 강기천, 김정현, 장병탁 - 한국정보과학회학술발표논문집, 2022 - dbpia.co.kr
요 약지칭 표현 세그멘테이션 (Referring Expression Segmentation, RES) 은 이미지와
이미지에 있는 물체를 지칭하는 텍스트를입력받아 지칭된 물체를 세크멘테이션 마스크를 통해 …