Dual encoding for video retrieval by text

J Dong, X Li, C Xu, X Yang, G Yang… - … on Pattern Analysis …, 2021 - ieeexplore.ieee.org
This paper attacks the challenging problem of video retrieval by text. In such a retrieval
paradigm, an end user searches for unlabeled videos by ad-hoc queries described …

W2vv++ fully deep learning for ad-hoc video search

X Li, C Xu, G Yang, Z Chen, J Dong - Proceedings of the 27th ACM …, 2019 - dl.acm.org
Ad-hoc video search (AVS) is an important yet challenging problem in multimedia retrieval.
Different from previous concept-based methods, we propose a fully deep learning method …

Text-adaptive multiple visual prototype matching for video-text retrieval

C Lin, A Wu, J Liang, J Zhang, W Ge… - Advances in neural …, 2022 - proceedings.neurips.cc
Cross-modal retrieval between videos and texts has gained increasing interest because of
the rapid emergence of videos on the web. Generally, a video contains rich instance and …

A comprehensive review of the video-to-text problem

J Perez-Martin, B Bustos, SJF Guimaraes… - Artificial Intelligence …, 2022 - Springer
Research in the Vision and Language area encompasses challenging topics that seek to
connect visual and textual information. When the visual information is related to videos, this …

SEA: Sentence encoder assembly for video retrieval by textual queries

X Li, F Zhou, C Xu, J Ji, G Yang - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Retrieving unlabeled videos by textual queries, known as Ad-hoc Video Search (AVS), is a
core theme in multimedia data management and retrieval. The success of AVS counts on …

Lightweight attentional feature fusion: A new baseline for text-to-video retrieval

F Hu, A Chen, Z Wang, F Zhou, J Dong, X Li - European conference on …, 2022 - Springer
In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-
video retrieval. Different from previous research that considers feature fusion only at one …

Multi-dataset training of transformers for robust action recognition

J Liang, E Zhang, J Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the task of robust feature representations, aiming to generalize well on multiple
datasets for action recognition. We build our method on Transformers for its efficacy …

A transformer-based system for action spotting in soccer videos

H Zhu, J Liang, C Lin, J Zhang, J Hu - Proceedings of the 5th …, 2022 - dl.acm.org
Action Spotting in the broadcast soccer game is important to understand salient actions and
video summary applications. In this paper, we propose an efficient transformer-based …

Argus: Efficient activity detection system for extended video analysis

W Liu, G Kang, PY Huang, X Chang… - Proceedings of the …, 2020 - openaccess.thecvf.com
Abstract We propose an Efficient Activity Detection System, Argus, for Extended Video
Analysis in the surveillance scenario. For the spatial-temporal event detection in the …

Interpretable embedding for ad-hoc video search

J Wu, CW Ngo - Proceedings of the 28th ACM International Conference …, 2020 - dl.acm.org
Answering query with semantic concepts has long been the mainstream approach for video
search. Until recently, its performance is surpassed by concept-free approach, which …