Exploiting visual semantic reasoning for video-text retrieval

S Liu, H Fan, S Qian, Y Chen… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract Video-Text Retrieval has been a hot research topic with the growth of multimedia
data on the internet. Transformer for video-text learning has attracted increasing attention …

被引用次数：181 相关文章所有 6 个版本

[PDF] thecvf.com

Dual learning with dynamic knowledge distillation for partially relevant video retrieval

J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …

被引用次数：13 相关文章所有 4 个版本

[PDF] arxiv.org

Reading-strategy inspired visual representation learning for text-to-video retrieval

J Dong, Y Wang, X Chen, X Qu, X Li… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

This paper aims for the task of text-to-video retrieval, where given a query in the form of a
natural-language sentence, it is asked to retrieve videos which are semantically relevant to …

被引用次数：66 相关文章所有 4 个版本

[PDF] arxiv.org

Partially relevant video retrieval

J Dong, X Chen, M Zhang, X Yang, S Chen… - Proceedings of the 30th …, 2022 - dl.acm.org

Current methods for text-to-video retrieval (T2VR) are trained and tested on video-captioning
oriented datasets such as MSVD, MSR-VTT and VATEX. A key property of these datasets is …

被引用次数：49 相关文章所有 3 个版本

Spatial-temporal graphs for cross-modal text2video retrieval

X Song, J Chen, Z Wu, YG Jiang - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Cross-modal text to video retrieval aims to find relevant videos given text queries, which is
crucial for various real-world applications. The key to address this task is to build the …

被引用次数：70 相关文章所有 2 个版本

[PDF] arxiv.org

Hanet: Hierarchical alignment networks for video-text retrieval

P Wu, X He, M Tang, Y Lv, J Liu - Proceedings of the 29th ACM …, 2021 - dl.acm.org

Video-text retrieval is an important yet challenging task in vision-language understanding,
which aims to learn a joint embedding space where related video and text instances are …

被引用次数：63 相关文章所有 4 个版本

[PDF] ijcai.org

[PDF][PDF] Multi-View Visual Semantic Embedding.

Z Li, C Guo, Z Feng, JN Hwang, X Xue - IJCAI, 2022 - ijcai.org

Abstract Visual Semantic Embedding (VSE) is a dominant method for vision-language
retrieval. Its purpose is to learn an embedding space so that visual data can be embedded in …

被引用次数：43 相关文章所有 2 个版本

Hierarchical cross-modal graph consistency learning for video-text retrieval

W Jin, Z Zhao, P Zhang, J Zhu, X He… - Proceedings of the 44th …, 2021 - dl.acm.org

Due to the popularity of video contents on the Internet, the information retrieval between
videos and texts has attracted broad interest from researchers, which is a challenging cross …

被引用次数：46 相关文章

Semantics-aware spatial-temporal binaries for cross-modal video retrieval

M Qi, J Qin, Y Yang, Y Wang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

With the current exponential growth of video-based social networks, video retrieval using
natural language is receiving ever-increasing attention. Most existing approaches tackle this …

被引用次数：73 相关文章所有 6 个版本

Using multimodal contrastive knowledge distillation for video-text retrieval

W Ma, Q Chen, T Zhou, S Zhao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Cross-modal retrieval aims to enable a flexible bi-directional retrieval experience across
different modalities (eg, searching for videos with texts). Many existing efforts tend to learn a …

被引用次数：25 相关文章所有 2 个版本

高级搜索

QQ 群