A novel convolutional architecture for video-text retrieval

J Dong, Y Wang, X Chen, X Qu, X Li… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

This paper aims for the task of text-to-video retrieval, where given a query in the form of a
natural-language sentence, it is asked to retrieve videos which are semantically relevant to …

被引用次数：66 相关文章所有 4 个版本

[PDF] ijcai.org

[PDF][PDF] Multi-View Visual Semantic Embedding.

Z Li, C Guo, Z Feng, JN Hwang, X Xue - IJCAI, 2022 - ijcai.org

Abstract Visual Semantic Embedding (VSE) is a dominant method for vision-language
retrieval. Its purpose is to learn an embedding space so that visual data can be embedded in …

被引用次数：43 相关文章所有 2 个版本

Semantics-aware spatial-temporal binaries for cross-modal video retrieval

M Qi, J Qin, Y Yang, Y Wang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

With the current exponential growth of video-based social networks, video retrieval using
natural language is receiving ever-increasing attention. Most existing approaches tackle this …

被引用次数：73 相关文章所有 6 个版本

[PDF] ijcai.org

[PDF][PDF] Dig into Multi-modal Cues for Video Retrieval with Hierarchical Alignment.

W Wang, M Zhang, R Chen, G Cai, P Zhou, P Peng… - IJCAI, 2021 - ijcai.org

Multi-modal cues presented in videos are usually beneficial for the challenging video-text
retrieval task on internet-scale datasets. Recent video retrieval methods take advantage of …

被引用次数：22 相关文章所有 2 个版本

FeatInter: exploring fine-grained object features for video-text retrieval

B Liu, Q Zheng, Y Wang, M Zhang, J Dong, X Wang - Neurocomputing, 2022 - Elsevier

In this paper, we target the challenging task of video-text retrieval. The common way for this
task is to learn a text-video joint embedding space by cross-modal representation learning …

被引用次数：11 相关文章所有 2 个版本

What matters: Attentive and relational feature aggregation network for video-text retrieval

X Hao, Y Zhou, D Wu, W Zhang, B Li… - … on Multimedia and …, 2021 - ieeexplore.ieee.org

Cross-modal video-text retrieval has been an emerging task due to the rapid growth of user-
generated videos on the Internet. Most existing approaches focus on extracting visual …

被引用次数：8 相关文章所有 2 个版本

[PDF] thecvf.com

SST-VLM: Sparse Sampling-Twice Inspired Video-Language Model

Y Gao, Z Lu - Proceedings of the Asian Conference on …, 2022 - openaccess.thecvf.com

Most existing video-language modeling methods densely sample dozens (or even
hundreds) of video clips from each raw video to learn the video representation for text-to …

被引用次数：1 相关文章所有 3 个版本

I3D convolutional network algorithm with feature gating

J Yu, Y Lai, Y Liu - Proceedings of the 2023 6th International …, 2023 - dl.acm.org

In order to effectively solve the persistent problems of low accuracy and high computational
complexity in video retrieval, we propose a feature-controlled retrieval algorithm which …

[引用][C] Two-fold Approach for Video Retrieval: Semantic Vectors to Guide Neural Network Training and Video Representation Approximation Via Language-Image …

JA Portillo Quintero - Instituto Tecnológico y de Estudios …

高级搜索

QQ 群