Lightweight attentional feature fusion: A new baseline for text-to-video retrieval

F Hu, A Chen, Z Wang, F Zhou, J Dong, X Li - European conference on …, 2022 - Springer
In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-
video retrieval. Different from previous research that considers feature fusion only at one …

Are all combinations equal? Combining textual and visual features with multiple space learning for text-based video retrieval

D Galanopoulos, V Mezaris - European Conference on Computer Vision, 2022 - Springer
In this paper we tackle the cross-modal video retrieval problem and, more specifically, we
focus on text-to-video retrieval. We investigate how to optimally combine multiple diverse …

Accommodating audio modality in CLIP for multimodal processing

L Ruan, A Hu, Y Song, L Zhang, S Zheng… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Multimodal processing has attracted much attention lately especially with the success of pre-
training. However, the exploration has mainly focused on vision-language pre-training, as …

Learn to understand negation in video retrieval

Z Wang, A Chen, F Hu, X Li - Proceedings of the 30th ACM International …, 2022 - dl.acm.org
Negation is a common linguistic skill that allows human to express what we do NOT want.
Naturally, one might expect video retrieval to support natural-language queries with …

Cliprerank: An Extremely Simple Method For Improving Ad-Hoc Video Search

A Chen, F Zhou, Z Wang, X Li - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Ad-hoc Video Search (AVS) enables users to search for unlabeled video content using on-
the-fly textual queries. Current deep learning-based models for AVS are trained to optimize …

[PDF][PDF] Supplementary material of Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval

F Hu, A Chen, Z Wang, F Zhou, J Dong, X Li - ecva.net
In this supplement, we provide more experimental results that are not included in the paper
due to space limit. Distribution of attentional weights per feature. We analyze the attentional …

[PDF][PDF] Renmin University of China at TRECVID 2021: Searching and Describing Video

X Li, A Chen, F Hu, X Chen, C Dong, G Yang - www-nlpir.nist.gov
In this paper, we summarize our TRECVID 2021 experiments. We participated in two tasks:
Ad-hoc Video Search (AVS) and Video-to-Text Description Generation (VTT). For the AVS …