Utilizing semantic word similarity measures for video retrieval

I Croitoru, SV Bogolin, M Leordeanu… - Proceedings of the …, 2021 - openaccess.thecvf.com

In recent years, considerable progress on the task of text-video retrieval has been achieved
by leveraging large-scale pretraining on visual and audio datasets to construct powerful …

被引用次数：138 相关文章所有 11 个版本

[PDF] thecvf.com

Cross modal retrieval with querybank normalisation

SV Bogolin, I Croitoru, H Jin, Y Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Profiting from large-scale training datasets, advances in neural architecture design and
efficient inference, joint embeddings have become the dominant approach for tackling cross …

被引用次数：73 相关文章所有 5 个版本

[PDF] thecvf.com

Exposing and mitigating spurious correlations for cross-modal retrieval

JM Kim, A Koepke, C Schmid… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Cross-modal retrieval methods are the preferred tool to search databases for the text that
best matches a query image and vice versa However, image-text retrieval models commonly …

被引用次数：20 相关文章所有 5 个版本

[PDF] ia.ac.cn

A survey on visual content-based video indexing and retrieval

W Hu, N Xie, L Li, X Zeng… - IEEE Transactions on …, 2011 - ieeexplore.ieee.org

Video indexing and retrieval have a wide spectrum of promising applications, motivating the
interest of researchers worldwide. This paper offers a tutorial and an overview of the …

被引用次数：817 相关文章所有 13 个版本

[PDF] psu.edu

Hierarchical semantic indexing for large scale image retrieval

J Deng, AC Berg, L Fei-Fei - CVPR 2011, 2011 - ieeexplore.ieee.org

This paper addresses the problem of similar image retrieval, especially in the setting of large-
scale datasets with millions to billions of images. The core novel contribution is an approach …

被引用次数：300 相关文章所有 20 个版本

Local correspondence network for weakly supervised temporal sentence grounding

W Yang, T Zhang, Y Zhang, F Wu - IEEE Transactions on Image …, 2021 - ieeexplore.ieee.org

Weakly supervised temporal sentence grounding has better scalability and practicability
than fully supervised methods in real-world application scenarios. However, most of existing …

被引用次数：65 相关文章所有 5 个版本

[PDF] springer.com

High-level event recognition in unconstrained videos

YG Jiang, S Bhattacharya, SF Chang… - International journal of …, 2013 - Springer

The goal of high-level event recognition is to automatically detect complex high-level events
in a given video sequence. This is a difficult task especially when videos are captured under …

被引用次数：253 相关文章所有 15 个版本

[PDF] thecvf.com

Visual semantic search: Retrieving videos via complex textual queries

D Lin, S Fidler, C Kong… - Proceedings of the IEEE …, 2014 - openaccess.thecvf.com

In this paper, we tackle the problem of retrieving videos using complex natural language
queries. Towards this goal, we first parse the sentential descriptions into a semantic graph …

被引用次数：176 相关文章所有 15 个版本

[PDF] thecvf.com

Zero-shot event detection using multi-modal fusion of weakly supervised concepts

S Wu, S Bondugula, F Luisier… - Proceedings of the …, 2014 - openaccess.thecvf.com

Current state-of-the-art systems for visual content analysis require large training sets for
each class of interest, and performance degrades rapidly with fewer examples. In this paper …

被引用次数：148 相关文章所有 11 个版本

[PDF] thecvf.com

Find and focus: Retrieve and localize video events with natural language queries

D Shao, Y Xiong, Y Zhao, Q Huang… - Proceedings of the …, 2018 - openaccess.thecvf.com

The thriving of video sharing services brings new challenges to video retrieval, eg the rapid
growth in video duration and content diversity. Meeting such challenges calls for new …

被引用次数：82 相关文章所有 6 个版本

高级搜索

QQ 群