Multi-level alignment network for domain adaptive cross-modal retrieval

[HTML][HTML] Gpt-4 enhanced multimodal grounding for autonomous driving: Leveraging cross-modal attention with large language models

H Liao, H Shen, Z Li, C Wang, G Li, Y Bie… - … in Transportation Research, 2024 - Elsevier

In the field of autonomous vehicles (AVs), accurately discerning commander intent and
executing linguistic commands within a visual context presents a significant challenge. This …

被引用次数：20 相关文章所有 4 个版本

[PDF] arxiv.org

Reading-strategy inspired visual representation learning for text-to-video retrieval

J Dong, Y Wang, X Chen, X Qu, X Li… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

This paper aims for the task of text-to-video retrieval, where given a query in the form of a
natural-language sentence, it is asked to retrieve videos which are semantically relevant to …

被引用次数：58 相关文章所有 4 个版本

[PDF] thecvf.com

Dual alignment unsupervised domain adaptation for video-text retrieval

X Hao, W Zhang, D Wu, F Zhu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Video-text retrieval is an emerging stream in both computer vision and natural language
processing communities, which aims to find relevant videos given text queries. In this paper …

被引用次数：13 相关文章所有 3 个版本

Multi-level knowledge-driven feature representation and triplet loss optimization network for image–text retrieval

X Qin, L Li, F Hao, M Ge, G Pang - Information Processing & Management, 2024 - Elsevier

Image–text retrieval plays a considerable role in associating vision and language. Existing
mainstream approaches focus on fine-grained alignment while ignoring the influence of …

被引用次数：3 相关文章所有 2 个版本

[PDF] neurips.cc

Uncertainty-aware alignment network for cross-domain video-text retrieval

X Hao, W Zhang - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Video-text retrieval is an important but challenging research task in the multimedia
community. In this paper, we address the challenge task of Unsupervised Domain …

被引用次数：4 相关文章所有 3 个版本

Domain adaptive twin support vector machine learning using privileged information

Y Li, H Sun, W Yan - Neurocomputing, 2022 - Elsevier

In the fields of computer vision and machine learning, domain adaptation has been
extensively studied and the main challenge in the case is how to transform the existing …

被引用次数：13 相关文章所有 2 个版本

Multi-level feature disentanglement network for cross-dataset face forgery detection

Z Fu, X Chen, D Liu, X Qu, J Dong, X Zhang… - Image and Vision …, 2023 - Elsevier

Synthesizing videos with forged faces is a fundamental yet important safety-critical task that
has caused severe security issues in recent years. Although many existing face forgery …

被引用次数：3 相关文章所有 2 个版本

Multilevel Semantic Interaction Alignment for Video–Text Cross-Modal Retrieval

L Chen, Z Deng, L Liu, S Yin - IEEE Transactions on Circuits …, 2024 - ieeexplore.ieee.org

Video–text cross-modal retrieval (VTR) is more natural and challenging than image–text
retrieval, which has attracted increasing interest from researchers in recent years. To align …

被引用次数：2 相关文章

FeatInter: exploring fine-grained object features for video-text retrieval

B Liu, Q Zheng, Y Wang, M Zhang, J Dong, X Wang - Neurocomputing, 2022 - Elsevier

In this paper, we target the challenging task of video-text retrieval. The common way for this
task is to learn a text-video joint embedding space by cross-modal representation learning …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Unpaired referring expression grounding via bidirectional cross-modal matching

H Shi, M Hayat, J Cai - Neurocomputing, 2023 - Elsevier

Referring expression grounding is an important and challenging task in computer vision. To
avoid the laborious annotation in conventional referring grounding, unpaired referring …

被引用次数：6 相关文章所有 5 个版本

高级搜索

QQ 群