Open-vocabulary video anomaly detection

P Wu, X Zhou, G Pang, Y Sun, J Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Current video anomaly detection (VAD) approaches with weak supervisions are inherently
limited to a closed-set setting and may struggle in open-world applications where there can …

Dual learning with dynamic knowledge distillation for partially relevant video retrieval

J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …

Dual alignment unsupervised domain adaptation for video-text retrieval

X Hao, W Zhang, D Wu, F Zhu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Video-text retrieval is an emerging stream in both computer vision and natural language
processing communities, which aims to find relevant videos given text queries. In this paper …

Fine-grained textual inversion network for zero-shot composed image retrieval

H Lin, H Wen, X Song, M Liu, Y Hu, L Nie - Proceedings of the 47th …, 2024 - dl.acm.org
Composed Image Retrieval (CIR) allows users to search target images with a multimodal
query, comprising a reference image and a modification text that describes the user's …

Adapting generative pretrained language model for open-domain multimodal sentence summarization

D Lin, L Jing, X Song, M Liu, T Sun, L Nie - Proceedings of the 46th …, 2023 - dl.acm.org
Multimodal sentence summarization, aiming to generate a brief summary of the source
sentence and image, is a new yet challenging task. Although existing methods have …

Uncertainty-aware alignment network for cross-domain video-text retrieval

X Hao, W Zhang - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Video-text retrieval is an important but challenging research task in the multimedia
community. In this paper, we address the challenge task of Unsupervised Domain …

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

J Wang, G Sun, P Wang, D Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
The increasing prevalence of video clips has sparked growing interest in text-video retrieval.
Recent advances focus on establishing a joint embedding space for text and video relying …

Audio-enhanced text-to-video retrieval using text-conditioned feature alignment

S Ibrahimi, X Sun, P Wang, A Garg… - Proceedings of the …, 2023 - openaccess.thecvf.com
Text-to-video retrieval systems have recently made significant progress by utilizing pre-
trained models trained on large-scale image-text pairs. However, most of the latest methods …

Text-to-motion retrieval: Towards joint understanding of human motion data and natural language

N Messina, J Sedmidubsky, F Falchi… - Proceedings of the 46th …, 2023 - dl.acm.org
Due to recent advances in pose-estimation methods, human motion can be extracted from a
common video in the form of 3D skeleton sequences. Despite wonderful application …

Text-Video Retrieval via Multi-Modal Hypergraph Networks

Q Li, L Su, J Zhao, L Xia, H Cai, S Cheng… - Proceedings of the 17th …, 2024 - dl.acm.org
Text-video retrieval is a challenging task that aims to identify relevant videos given textual
queries. Compared to conventional textual retrieval, the main obstacle for text-video retrieval …