What matters for ad-hoc video search? a large-scale evaluation on TRECVID

F Hu, A Chen, Z Wang, F Zhou, J Dong, X Li - European conference on …, 2022 - Springer

In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-
video retrieval. Different from previous research that considers feature fusion only at one …

被引用次数：41 相关文章所有 6 个版本

[PDF] arxiv.org

Are all combinations equal? Combining textual and visual features with multiple space learning for text-based video retrieval

D Galanopoulos, V Mezaris - European Conference on Computer Vision, 2022 - Springer

In this paper we tackle the cross-modal video retrieval problem and, more specifically, we
focus on text-to-video retrieval. We investigate how to optimally combine multiple diverse …

被引用次数：11 相关文章所有 5 个版本

[PDF] aaai.org

Accommodating audio modality in CLIP for multimodal processing

L Ruan, A Hu, Y Song, L Zhang, S Zheng… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Multimodal processing has attracted much attention lately especially with the success of pre-
training. However, the exploration has mainly focused on vision-language pre-training, as …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

Learn to understand negation in video retrieval

Z Wang, A Chen, F Hu, X Li - Proceedings of the 30th ACM International …, 2022 - dl.acm.org

Negation is a common linguistic skill that allows human to express what we do NOT want.
Naturally, one might expect video retrieval to support natural-language queries with …

被引用次数：11 相关文章所有 4 个版本

[PDF] arxiv.org

Cliprerank: An Extremely Simple Method For Improving Ad-Hoc Video Search

A Chen, F Zhou, Z Wang, X Li - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

Ad-hoc Video Search (AVS) enables users to search for unlabeled video content using on-
the-fly textual queries. Current deep learning-based models for AVS are trained to optimize …

[PDF][PDF] Supplementary material of Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval

F Hu, A Chen, Z Wang, F Zhou, J Dong, X Li - ecva.net

In this supplement, we provide more experimental results that are not included in the paper
due to space limit. Distribution of attentional weights per feature. We analyze the attentional …

[PDF] nist.gov

[PDF][PDF] Renmin University of China at TRECVID 2021: Searching and Describing Video

X Li, A Chen, F Hu, X Chen, C Dong, G Yang - www-nlpir.nist.gov

In this paper, we summarize our TRECVID 2021 experiments. We participated in two tasks:
Ad-hoc Video Search (AVS) and Video-to-Text Description Generation (VTT). For the AVS …

高级搜索

QQ 群