Altfreezing for more general video face forgery detection

Z Wang, J Bao, W Zhou, W Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Existing face forgery detection models try to discriminate fake images by detecting only
spatial artifacts (eg, generative artifacts, blending) or mainly temporal artifacts (eg, flickering …

Language-bridged spatial-temporal interaction for referring video object segmentation

Z Ding, T Hui, J Huang, X Wei… - Proceedings of the …, 2022 - openaccess.thecvf.com
Referring video object segmentation aims to predict foreground labels for objects referred by
natural language expressions in videos. Previous methods either depend on 3D ConvNets …

Extdm: Distribution extrapolation diffusion model for video prediction

Z Zhang, J Hu, W Cheng, D Paudel… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video prediction is a challenging task due to its nature of uncertainty especially for
forecasting a long period. To model the temporal dynamics advanced methods benefit from …

Tell me what happened: Unifying text-guided video completion via multimodal masked video generation

TJ Fu, L Yu, N Zhang, CY Fu, JC Su… - Proceedings of the …, 2023 - openaccess.thecvf.com
Generating a video given the first several static frames is challenging as it anticipates
reasonable future frames with temporal coherence. Besides video prediction, the ability to …

Language-driven artistic style transfer

TJ Fu, XE Wang, WY Wang - European Conference on Computer Vision, 2022 - Springer
Despite having promising results, style transfer, which requires preparing style images in
advance, may result in lack of creativity and accessibility. Following human instruction, on …

Shatter and gather: Learning referring image segmentation with text supervision

D Kim, N Kim, C Lan, S Kwak - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Referring image segmentation, the task of segmenting any arbitrary entities described in free-
form texts, opens up a variety of vision applications. However, manual labeling of training …

A review of multi-modal learning from the text-guided visual processing viewpoint

U Ullah, JS Lee, CH An, H Lee, SY Park, RH Baek… - Sensors, 2022 - mdpi.com
For decades, co-relating different data domains to attain the maximum potential of machines
has driven research, especially in neural networks. Similarly, text and visual data (images …

Learning cross-modal affinity for referring video object segmentation targeting limited samples

G Li, M Gao, H Liu, X Zhen… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Referring video object segmentation (RVOS), as a supervised learning task, relies on
sufficient annotated data for a given scene. However, in more realistic scenarios, only …

Fully transformer-equipped architecture for end-to-end referring video object segmentation

P Li, Y Zhang, L Yuan, X Xu - Information Processing & Management, 2024 - Elsevier
Abstract Referring Video Object Segmentation (RVOS) requires segmenting the object in
video referred by a natural language query. Existing methods mainly rely on sophisticated …

Multimodal dialog systems with dual knowledge-enhanced generative pretrained language model

X Chen, X Song, L Jing, S Li, L Hu, L Nie - ACM Transactions on …, 2023 - dl.acm.org
Text response generation for multimodal task-oriented dialog systems, which aims to
generate the proper text response given the multimodal context, is an essential yet …