Teachtext: Crossmodal generalized distillation for text-video retrieval

I Croitoru, SV Bogolin, M Leordeanu… - Proceedings of the …, 2021 - openaccess.thecvf.com
In recent years, considerable progress on the task of text-video retrieval has been achieved
by leveraging large-scale pretraining on visual and audio datasets to construct powerful …

Cross modal retrieval with querybank normalisation

SV Bogolin, I Croitoru, H Jin, Y Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Profiting from large-scale training datasets, advances in neural architecture design and
efficient inference, joint embeddings have become the dominant approach for tackling cross …

Exposing and mitigating spurious correlations for cross-modal retrieval

JM Kim, A Koepke, C Schmid… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Cross-modal retrieval methods are the preferred tool to search databases for the text that
best matches a query image and vice versa However, image-text retrieval models commonly …

A survey on visual content-based video indexing and retrieval

W Hu, N Xie, L Li, X Zeng… - IEEE Transactions on …, 2011 - ieeexplore.ieee.org
Video indexing and retrieval have a wide spectrum of promising applications, motivating the
interest of researchers worldwide. This paper offers a tutorial and an overview of the …

Hierarchical semantic indexing for large scale image retrieval

J Deng, AC Berg, L Fei-Fei - CVPR 2011, 2011 - ieeexplore.ieee.org
This paper addresses the problem of similar image retrieval, especially in the setting of large-
scale datasets with millions to billions of images. The core novel contribution is an approach …

Local correspondence network for weakly supervised temporal sentence grounding

W Yang, T Zhang, Y Zhang, F Wu - IEEE Transactions on Image …, 2021 - ieeexplore.ieee.org
Weakly supervised temporal sentence grounding has better scalability and practicability
than fully supervised methods in real-world application scenarios. However, most of existing …

High-level event recognition in unconstrained videos

YG Jiang, S Bhattacharya, SF Chang… - International journal of …, 2013 - Springer
The goal of high-level event recognition is to automatically detect complex high-level events
in a given video sequence. This is a difficult task especially when videos are captured under …

Visual semantic search: Retrieving videos via complex textual queries

D Lin, S Fidler, C Kong… - Proceedings of the IEEE …, 2014 - openaccess.thecvf.com
In this paper, we tackle the problem of retrieving videos using complex natural language
queries. Towards this goal, we first parse the sentential descriptions into a semantic graph …

Zero-shot event detection using multi-modal fusion of weakly supervised concepts

S Wu, S Bondugula, F Luisier… - Proceedings of the …, 2014 - openaccess.thecvf.com
Current state-of-the-art systems for visual content analysis require large training sets for
each class of interest, and performance degrades rapidly with fewer examples. In this paper …

Find and focus: Retrieve and localize video events with natural language queries

D Shao, Y Xiong, Y Zhao, Q Huang… - Proceedings of the …, 2018 - openaccess.thecvf.com
The thriving of video sharing services brings new challenges to video retrieval, eg the rapid
growth in video duration and content diversity. Meeting such challenges calls for new …