Lightweight attentional feature fusion: A new baseline for text-to-video retrieval

F Hu, A Chen, Z Wang, F Zhou, J Dong, X Li - European conference on …, 2022 - Springer
In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-
video retrieval. Different from previous research that considers feature fusion only at one …

What matters for ad-hoc video search? a large-scale evaluation on TRECVID

A Chen, F Hu, Z Wang, F Zhou… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
For quantifying progress in Ad-hoc Video Search (AVS), the annual TRECVID AVS task is an
important international evaluation. Solutions submitted by the task participants vary in terms …

(un) likelihood training for interpretable embedding

J Wu, CW Ngo, WK Chan, Z Hou - ACM Transactions on Information …, 2023 - dl.acm.org
Cross-modal representation learning has become a new normal for bridging the semantic
gap between text and visual data. Learning modality agnostic representations in a …

[PDF][PDF] Renmin university of china and tencent at trecvid 2023: Harnessing pre-trained models for ad-hoc video search

X Li, F Hu, R Zhao, Z Wang, J Liu, J Liu, B Lan… - TRECVID, 2023 - www-nlpir.nist.gov
We summarize our TRECVID 2023 Ad-hoc Video Search (AVS) experiments. We focus on
leveraging pre-trained multimodal models for video and text representation. For video …

PolySmart@ TRECVid 2024 Medical Video Question Answering

J Wu, Y Jiang, XY Wei, Q Li - arXiv preprint arXiv:2412.15514, 2024 - arxiv.org
Video Corpus Visual Answer Localization (VCVAL) includes question-related video retrieval
and visual answer localization in the videos. Specifically, we use text-to-text retrieval to find …

Text Query to Web Image to Video: A Comprehensive Ad-Hoc Video Search

NM Nguyen, TD Mai, DD Le - Proceedings of the Asian …, 2024 - openaccess.thecvf.com
In this study, we propose a novel approach for Ad-hoc Video Search that leverages the
power of image search engines to synthesize query images for corresponding textual …

Renmin University of China at TRECVID 2022: Improving video search by feature fusion and negation understanding

X Li, A Chen, Z Wang, F Hu, K Tian, X Chen… - arXiv preprint arXiv …, 2022 - arxiv.org
We summarize our TRECVID 2022 Ad-hoc Video Search (AVS) experiments. Our solution is
built with two new techniques, namely Lightweight Attentional Feature Fusion (LAFF) for …

Cliprerank: An Extremely Simple Method For Improving Ad-Hoc Video Search

A Chen, F Zhou, Z Wang, X Li - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Ad-hoc Video Search (AVS) enables users to search for unlabeled video content using on-
the-fly textual queries. Current deep learning-based models for AVS are trained to optimize …

PolySmart and VIREO@ TRECVid 2024 Ad-hoc Video Search

J Wu, CW Ngo, XY Wei, Q Li - arXiv preprint arXiv:2412.15494, 2024 - arxiv.org
This year, we explore generation-augmented retrieval for the TRECVid AVS task.
Specifically, the understanding of textual query is enhanced by three generations, including …

(Un) likelihood Training for Interpretable Embedding

J Wu, CW Ngo, WK Chan, Z Hou - arXiv preprint arXiv:2207.00282, 2022 - arxiv.org
Cross-modal representation learning has become a new normal for bridging the semantic
gap between text and visual data. Learning modality agnostic representations in a …