Deep cross-media knowledge transfer

W Guo, J Wang, S Wang - Ieee Access, 2019 - ieeexplore.ieee.org

Multimodal representation learning, which aims to narrow the heterogeneity gap among
different modalities, plays an indispensable role in the utilization of ubiquitous multimodal …

被引用次数：485 相关文章所有 4 个版本

Ternary adversarial networks with self-supervision for zero-shot cross-modal retrieval

X Xu, H Lu, J Song, Y Yang… - IEEE transactions on …, 2019 - ieeexplore.ieee.org

Given a query instance from one modality (eg, image), cross-modal retrieval aims to find
semantically similar instances from another modality (eg, text). To perform cross-modal …

被引用次数：216 相关文章所有 3 个版本

Graph embedding contrastive multi-modal representation learning for clustering

W Xia, T Wang, Q Gao, M Yang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Multi-modal clustering (MMC) aims to explore complementary information from diverse
modalities for clustering performance facilitating. This article studies challenging problems in …

被引用次数：40 相关文章所有 4 个版本

[PDF] thecvf.com

Multi-modality associative bridging through memory: Speech sound recollected from face video

M Kim, J Hong, SJ Park, YM Ro - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

In this paper, we introduce a novel audio-visual multi-modal bridging framework that can
utilize both audio and visual information, even with uni-modal inputs. We exploit a memory …

被引用次数：45 相关文章所有 8 个版本

Joint feature synthesis and embedding: Adversarial cross-modal retrieval revisited

X Xu, K Lin, Y Yang, A Hanjalic… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Recently, generative adversarial network (GAN) has shown its strong ability on modeling
data distribution via adversarial learning. Cross-modal GAN, which attempts to utilize the …

被引用次数：65 相关文章所有 6 个版本

[PDF] arxiv.org

MHTN: Modal-adversarial hybrid transfer network for cross-modal retrieval

X Huang, Y Peng, M Yuan - IEEE transactions on cybernetics, 2018 - ieeexplore.ieee.org

Cross-modal retrieval has drawn wide interest for retrieval across different modalities (such
as text, image, video, audio, and 3-D model). However, existing methods based on a deep …

被引用次数：135 相关文章所有 5 个版本

[PDF] thecvf.com

Dual alignment unsupervised domain adaptation for video-text retrieval

X Hao, W Zhang, D Wu, F Zhu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Video-text retrieval is an emerging stream in both computer vision and natural language
processing communities, which aims to find relevant videos given text queries. In this paper …

被引用次数：15 相关文章所有 3 个版本

Learning cross-modal common representations by private–shared subspaces separation

X Xu, K Lin, L Gao, H Lu, HT Shen… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Due to the inconsistent distributions and representations of different modalities (eg, images
and texts), it is very challenging to correlate such heterogeneous data. A standard solution is …

被引用次数：55 相关文章所有 3 个版本

[PDF] arxiv.org

Deep image-to-video adaptation and fusion networks for action recognition

Y Liu, Z Lu, J Li, T Yang, C Yao - IEEE Transactions on Image …, 2019 - ieeexplore.ieee.org

Existing deep learning methods for action recognition in videos require a large number of
labeled videos for training, which is labor-intensive and time-consuming. For the same …

被引用次数：63 相关文章所有 6 个版本

[PDF] arxiv.org

Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model

JH Yeo, M Kim, J Choi, DH Kim… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip
movements. VSR is regarded as a challenging task because of the insufficient information …

被引用次数：9 相关文章所有 3 个版本

高级搜索

QQ 群