Reading-strategy inspired visual representation learning for text-to-video retrieval

J Dong, Y Wang, X Chen, X Qu, X Li… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
This paper aims for the task of text-to-video retrieval, where given a query in the form of a
natural-language sentence, it is asked to retrieve videos which are semantically relevant to …

[PDF][PDF] Multi-View Visual Semantic Embedding.

Z Li, C Guo, Z Feng, JN Hwang, X Xue - IJCAI, 2022 - ijcai.org
Abstract Visual Semantic Embedding (VSE) is a dominant method for vision-language
retrieval. Its purpose is to learn an embedding space so that visual data can be embedded in …

Semantics-aware spatial-temporal binaries for cross-modal video retrieval

M Qi, J Qin, Y Yang, Y Wang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
With the current exponential growth of video-based social networks, video retrieval using
natural language is receiving ever-increasing attention. Most existing approaches tackle this …

[PDF][PDF] Dig into Multi-modal Cues for Video Retrieval with Hierarchical Alignment.

W Wang, M Zhang, R Chen, G Cai, P Zhou, P Peng… - IJCAI, 2021 - ijcai.org
Multi-modal cues presented in videos are usually beneficial for the challenging video-text
retrieval task on internet-scale datasets. Recent video retrieval methods take advantage of …

FeatInter: exploring fine-grained object features for video-text retrieval

B Liu, Q Zheng, Y Wang, M Zhang, J Dong, X Wang - Neurocomputing, 2022 - Elsevier
In this paper, we target the challenging task of video-text retrieval. The common way for this
task is to learn a text-video joint embedding space by cross-modal representation learning …

What matters: Attentive and relational feature aggregation network for video-text retrieval

X Hao, Y Zhou, D Wu, W Zhang, B Li… - … on Multimedia and …, 2021 - ieeexplore.ieee.org
Cross-modal video-text retrieval has been an emerging task due to the rapid growth of user-
generated videos on the Internet. Most existing approaches focus on extracting visual …

SST-VLM: Sparse Sampling-Twice Inspired Video-Language Model

Y Gao, Z Lu - Proceedings of the Asian Conference on …, 2022 - openaccess.thecvf.com
Most existing video-language modeling methods densely sample dozens (or even
hundreds) of video clips from each raw video to learn the video representation for text-to …

I3D convolutional network algorithm with feature gating

J Yu, Y Lai, Y Liu - Proceedings of the 2023 6th International …, 2023 - dl.acm.org
In order to effectively solve the persistent problems of low accuracy and high computational
complexity in video retrieval, we propose a feature-controlled retrieval algorithm which …

[引用][C] Two-fold Approach for Video Retrieval: Semantic Vectors to Guide Neural Network Training and Video Representation Approximation Via Language-Image …

JA Portillo Quintero - Instituto Tecnológico y de Estudios …