- 学术资源搜索

Altfreezing for more general video face forgery detection

Z Wang, J Bao, W Zhou, W Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Existing face forgery detection models try to discriminate fake images by detecting only
spatial artifacts (eg, generative artifacts, blending) or mainly temporal artifacts (eg, flickering …

被引用次数：78 相关文章所有 5 个版本

[PDF] thecvf.com

Language-bridged spatial-temporal interaction for referring video object segmentation

Z Ding, T Hui, J Huang, X Wei… - Proceedings of the …, 2022 - openaccess.thecvf.com

Referring video object segmentation aims to predict foreground labels for objects referred by
natural language expressions in videos. Previous methods either depend on 3D ConvNets …

被引用次数：65 相关文章所有 11 个版本

[PDF] thecvf.com

Extdm: Distribution extrapolation diffusion model for video prediction

Z Zhang, J Hu, W Cheng, D Paudel… - Proceedings of the …, 2024 - openaccess.thecvf.com

Video prediction is a challenging task due to its nature of uncertainty especially for
forecasting a long period. To model the temporal dynamics advanced methods benefit from …

被引用次数：16 相关文章所有 3 个版本

[PDF] thecvf.com

Tell me what happened: Unifying text-guided video completion via multimodal masked video generation

TJ Fu, L Yu, N Zhang, CY Fu, JC Su… - Proceedings of the …, 2023 - openaccess.thecvf.com

Generating a video given the first several static frames is challenging as it anticipates
reasonable future frames with temporal coherence. Besides video prediction, the ability to …

被引用次数：39 相关文章所有 8 个版本

[PDF] ucsb.edu

Language-driven artistic style transfer

TJ Fu, XE Wang, WY Wang - European Conference on Computer Vision, 2022 - Springer

Despite having promising results, style transfer, which requires preparing style images in
advance, may result in lack of creativity and accessibility. Following human instruction, on …

被引用次数：49 相关文章所有 3 个版本

[PDF] thecvf.com

Shatter and gather: Learning referring image segmentation with text supervision

D Kim, N Kim, C Lan, S Kwak - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Referring image segmentation, the task of segmenting any arbitrary entities described in free-
form texts, opens up a variety of vision applications. However, manual labeling of training …

被引用次数：18 相关文章所有 6 个版本

[PDF] mdpi.com

A review of multi-modal learning from the text-guided visual processing viewpoint

U Ullah, JS Lee, CH An, H Lee, SY Park, RH Baek… - Sensors, 2022 - mdpi.com

For decades, co-relating different data domains to attain the maximum potential of machines
has driven research, especially in neural networks. Similarly, text and visual data (images …

被引用次数：8 相关文章所有 7 个版本

[PDF] thecvf.com

Learning cross-modal affinity for referring video object segmentation targeting limited samples

G Li, M Gao, H Liu, X Zhen… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Referring video object segmentation (RVOS), as a supervised learning task, relies on
sufficient annotated data for a given scene. However, in more realistic scenarios, only …

被引用次数：3 相关文章所有 5 个版本

[PDF] arxiv.org

Fully transformer-equipped architecture for end-to-end referring video object segmentation

P Li, Y Zhang, L Yuan, X Xu - Information Processing & Management, 2024 - Elsevier

Abstract Referring Video Object Segmentation (RVOS) requires segmenting the object in
video referred by a natural language query. Existing methods mainly rely on sophisticated …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Multimodal dialog systems with dual knowledge-enhanced generative pretrained language model

X Chen, X Song, L Jing, S Li, L Hu, L Nie - ACM Transactions on …, 2023 - dl.acm.org

Text response generation for multimodal task-oriented dialog systems, which aims to
generate the proper text response given the multimodal context, is an essential yet …

被引用次数：17 相关文章所有 3 个版本

高级搜索

QQ 群

Altfreezing for more general video face forgery detection

Language-bridged spatial-temporal interaction for referring video object segmentation

Extdm: Distribution extrapolation diffusion model for video prediction

Tell me what happened: Unifying text-guided video completion via multimodal masked video generation

Language-driven artistic style transfer

Shatter and gather: Learning referring image segmentation with text supervision

A review of multi-modal learning from the text-guided visual processing viewpoint

Learning cross-modal affinity for referring video object segmentation targeting limited samples

Fully transformer-equipped architecture for end-to-end referring video object segmentation

Multimodal dialog systems with dual knowledge-enhanced generative pretrained language model

引用