A Chen, F Hu, Z Wang, F Zhou… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
For quantifying progress in Ad-hoc Video Search (AVS), the annual TRECVID AVS task is an important international evaluation. Solutions submitted by the task participants vary in terms …
Cross-modal representation learning has become a new normal for bridging the semantic gap between text and visual data. Learning modality agnostic representations in a …
X Li, F Hu, R Zhao, Z Wang, J Liu, J Liu, B Lan… - TRECVID, 2023 - www-nlpir.nist.gov
We summarize our TRECVID 2023 Ad-hoc Video Search (AVS) experiments. We focus on leveraging pre-trained multimodal models for video and text representation. For video …
J Wu, Y Jiang, XY Wei, Q Li - arXiv preprint arXiv:2412.15514, 2024 - arxiv.org
Video Corpus Visual Answer Localization (VCVAL) includes question-related video retrieval and visual answer localization in the videos. Specifically, we use text-to-text retrieval to find …
NM Nguyen, TD Mai, DD Le - Proceedings of the Asian …, 2024 - openaccess.thecvf.com
In this study, we propose a novel approach for Ad-hoc Video Search that leverages the power of image search engines to synthesize query images for corresponding textual …
X Li, A Chen, Z Wang, F Hu, K Tian, X Chen… - arXiv preprint arXiv …, 2022 - arxiv.org
We summarize our TRECVID 2022 Ad-hoc Video Search (AVS) experiments. Our solution is built with two new techniques, namely Lightweight Attentional Feature Fusion (LAFF) for …
A Chen, F Zhou, Z Wang, X Li - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Ad-hoc Video Search (AVS) enables users to search for unlabeled video content using on- the-fly textual queries. Current deep learning-based models for AVS are trained to optimize …
This year, we explore generation-augmented retrieval for the TRECVid AVS task. Specifically, the understanding of textual query is enhanced by three generations, including …
Cross-modal representation learning has become a new normal for bridging the semantic gap between text and visual data. Learning modality agnostic representations in a …