Referring video object segmentation aims to predict foreground labels for objects referred by natural language expressions in videos. Previous methods either depend on 3D ConvNets …
Video prediction is a challenging task due to its nature of uncertainty especially for forecasting a long period. To model the temporal dynamics advanced methods benefit from …
Generating a video given the first several static frames is challenging as it anticipates reasonable future frames with temporal coherence. Besides video prediction, the ability to …
TJ Fu, XE Wang, WY Wang - European Conference on Computer Vision, 2022 - Springer
Despite having promising results, style transfer, which requires preparing style images in advance, may result in lack of creativity and accessibility. Following human instruction, on …
D Kim, N Kim, C Lan, S Kwak - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Referring image segmentation, the task of segmenting any arbitrary entities described in free- form texts, opens up a variety of vision applications. However, manual labeling of training …
For decades, co-relating different data domains to attain the maximum potential of machines has driven research, especially in neural networks. Similarly, text and visual data (images …
G Li, M Gao, H Liu, X Zhen… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Referring video object segmentation (RVOS), as a supervised learning task, relies on sufficient annotated data for a given scene. However, in more realistic scenarios, only …
P Li, Y Zhang, L Yuan, X Xu - Information Processing & Management, 2024 - Elsevier
Abstract Referring Video Object Segmentation (RVOS) requires segmenting the object in video referred by a natural language query. Existing methods mainly rely on sophisticated …
Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet …