Image-text retrieval with binary and continuous label supervision

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

Image-text retrieval with binary and continuous label supervision

MoCap-Video Data Retrieval with Deep Cross-Modal Learning

L Zhang, J Peng, N Lv - International Conference on Multimedia Modeling, 2024 - Springer

Cross-modal retrieval between video and motion capture (MoCap) data facilitates efficient
reuse of human motion data in either skeletal or video format. For this purpose, we propose …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Text-based Audio Retrieval by Learning from Similarities between Audio Captions

H Xie, K Khorrami, O Räsänen… - IEEE Signal Processing …, 2024 - ieeexplore.ieee.org

This letter proposes to use similarities of audio captions for estimating audio-caption
relevances to be used for training text-based audio retrieval systems. Current audio-caption …

Dynamic Soft Labeling for Visual Semantic Embedding

J Yu, Y Ding, J Dong, Y Li - … of the 2024 International Conference on …, 2024 - dl.acm.org

Visual Semantic Embedding (VSE) is a prominent approach in image-text retrieval, aiming to
learn a deep embedding space that aligns visual data with semantic text labels. However …

[PDF] madison-proceedings.com

A Review on the Progress of Text-based Image Retrieval Technology

R Ma, X Cui, W Li, L Lu - Advances in Engineering …, 2024 - madison-proceedings.com

高级搜索

QQ 群

Image-text retrieval with binary and continuous label supervision

MoCap-Video Data Retrieval with Deep Cross-Modal Learning

Text-based Audio Retrieval by Learning from Similarities between Audio Captions

Dynamic Soft Labeling for Visual Semantic Embedding

A Review on the Progress of Text-based Image Retrieval Technology

引用