- 学术资源搜索

Dual encoding for video retrieval by text

J Dong, X Li, C Xu, X Yang, G Yang… - … on Pattern Analysis …, 2021 - ieeexplore.ieee.org

This paper attacks the challenging problem of video retrieval by text. In such a retrieval
paradigm, an end user searches for unlabeled videos by ad-hoc queries described …

被引用次数：239 相关文章所有 7 个版本

[PDF] lixirong.net

W2vv++ fully deep learning for ad-hoc video search

X Li, C Xu, G Yang, Z Chen, J Dong - Proceedings of the 27th ACM …, 2019 - dl.acm.org

Ad-hoc video search (AVS) is an important yet challenging problem in multimedia retrieval.
Different from previous concept-based methods, we propose a fully deep learning method …

被引用次数：155 相关文章所有 6 个版本

[PDF] neurips.cc

Text-adaptive multiple visual prototype matching for video-text retrieval

C Lin, A Wu, J Liang, J Zhang, W Ge… - Advances in neural …, 2022 - proceedings.neurips.cc

Cross-modal retrieval between videos and texts has gained increasing interest because of
the rapid emergence of videos on the web. Generally, a video contains rich instance and …

被引用次数：26 相关文章所有 5 个版本

[PDF] arxiv.org

A comprehensive review of the video-to-text problem

J Perez-Martin, B Bustos, SJF Guimaraes… - Artificial Intelligence …, 2022 - Springer

Research in the Vision and Language area encompasses challenging topics that seek to
connect visual and textual information. When the visual information is related to videos, this …

被引用次数：19 相关文章所有 8 个版本

[PDF] arxiv.org

SEA: Sentence encoder assembly for video retrieval by textual queries

X Li, F Zhou, C Xu, J Ji, G Yang - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Retrieving unlabeled videos by textual queries, known as Ad-hoc Video Search (AVS), is a
core theme in multimedia data management and retrieval. The success of AVS counts on …

被引用次数：57 相关文章所有 3 个版本

[PDF] arxiv.org

Lightweight attentional feature fusion: A new baseline for text-to-video retrieval

F Hu, A Chen, Z Wang, F Zhou, J Dong, X Li - European conference on …, 2022 - Springer

In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-
video retrieval. Different from previous research that considers feature fusion only at one …

被引用次数：42 相关文章所有 6 个版本

[PDF] neurips.cc

Multi-dataset training of transformers for robust action recognition

J Liang, E Zhang, J Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the task of robust feature representations, aiming to generalize well on multiple
datasets for action recognition. We build our method on Transformers for its efficacy …

被引用次数：12 相关文章所有 5 个版本

[PDF] acm.org

A transformer-based system for action spotting in soccer videos

H Zhu, J Liang, C Lin, J Zhang, J Hu - Proceedings of the 5th …, 2022 - dl.acm.org

Action Spotting in the broadcast soccer game is important to understand salient actions and
video summary applications. In this paper, we propose an efficient transformer-based …

被引用次数：20 相关文章

[PDF] thecvf.com

Argus: Efficient activity detection system for extended video analysis

W Liu, G Kang, PY Huang, X Chang… - Proceedings of the …, 2020 - openaccess.thecvf.com

Abstract We propose an Efficient Activity Detection System, Argus, for Extended Video
Analysis in the surveillance scenario. For the spatial-temporal event detection in the …

被引用次数：50 相关文章所有 6 个版本

[PDF] arxiv.org

Interpretable embedding for ad-hoc video search

J Wu, CW Ngo - Proceedings of the 28th ACM International Conference …, 2020 - dl.acm.org

Answering query with semantic concepts has long been the mainstream approach for video
search. Until recently, its performance is surpassed by concept-free approach, which …

被引用次数：36 相关文章所有 6 个版本

高级搜索

QQ 群

Dual encoding for video retrieval by text

W2vv++ fully deep learning for ad-hoc video search

Text-adaptive multiple visual prototype matching for video-text retrieval

A comprehensive review of the video-to-text problem

SEA: Sentence encoder assembly for video retrieval by textual queries

Lightweight attentional feature fusion: A new baseline for text-to-video retrieval

Multi-dataset training of transformers for robust action recognition

A transformer-based system for action spotting in soccer videos

Argus: Efficient activity detection system for extended video analysis

Interpretable embedding for ad-hoc video search

引用