Unsupervised multimodal neural machine translation with pseudo visual pivoting

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier

Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

被引用次数：88 相关文章所有 5 个版本

Video pivoting unsupervised multi-modal machine translation

M Li, PY Huang, X Chang, J Hu, Y Yang… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org

The main challenge in the field of unsupervised machine translation (UMT) is to associate
source-target sentences in the latent space. As people who speak different languages share …

被引用次数：109 相关文章所有 7 个版本

[PDF] arxiv.org

Support-set bottlenecks for video-text representation learning

M Patrick, PY Huang, Y Asano, F Metze… - arXiv preprint arXiv …, 2020 - arxiv.org

The dominant paradigm for learning video-text representations--noise contrastive learning--
increases the similarity of the representations of pairs of samples that are known to be …

被引用次数：264 相关文章所有 9 个版本

[PDF] aclanthology.org

Experience grounds language

Y Bisk, A Holtzman, J Thomason, J Andreas… - arXiv preprint arXiv …, 2020 - arxiv.org

Language understanding research is held back by a failure to relate language to the
physical world it describes and to the social interactions it facilitates. Despite the incredible …

被引用次数：375 相关文章所有 5 个版本

[PDF] arxiv.org

Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis

W Han, H Chen, A Gelbukh, A Zadeh… - Proceedings of the …, 2021 - dl.acm.org

Multimodal sentiment analysis aims to extract and integrate semantic information collected
from multiple modalities to recognize the expressed emotions and sentiment in multimodal …

被引用次数：157 相关文章所有 4 个版本

[PDF] mdpi.com

Deep vision multimodal learning: Methodology, benchmark, and trend

W Chai, G Wang - Applied Sciences, 2022 - mdpi.com

Deep vision multimodal learning aims at combining deep visual representation learning with
other modalities, such as text, sound, and data collected from other sensors. With the fast …

被引用次数：22 相关文章所有 4 个版本

[PDF] arxiv.org

Scene graph as pivoting: Inference-time image-free unsupervised multimodal machine translation with visual scene hallucination

H Fei, Q Liu, M Zhang, M Zhang, TS Chua - arXiv preprint arXiv …, 2023 - arxiv.org

In this work, we investigate a more realistic unsupervised multimodal machine translation
(UMMT) setup, inference-time image-free UMMT, where the model is trained with source-text …

被引用次数：43 相关文章所有 4 个版本

[PDF] mlr.press

IGLUE: A benchmark for transfer learning across modalities, tasks, and languages

E Bugliarello, F Liu, J Pfeiffer, S Reddy… - International …, 2022 - proceedings.mlr.press

Reliable evaluation benchmarks designed for replicability and comprehensiveness have
driven progress in machine learning. Due to the lack of a multilingual benchmark, however …

被引用次数：52 相关文章所有 5 个版本

[PDF] arxiv.org

Cross2StrA: Unpaired cross-lingual image captioning with cross-lingual cross-modal structure-pivoted alignment

S Wu, H Fei, W Ji, TS Chua - arXiv preprint arXiv:2305.12260, 2023 - arxiv.org

Unpaired cross-lingual image captioning has long suffered from irrelevancy and disfluency
issues, due to the inconsistencies of the semantic scene and syntax attributes during …

被引用次数：37 相关文章所有 5 个版本

[PDF] thecvf.com

Uc2: Universal cross-lingual cross-modal vision-and-language pre-training

M Zhou, L Zhou, S Wang, Y Cheng… - Proceedings of the …, 2021 - openaccess.thecvf.com

Vision-and-language pre-training has achieved impressive success in learning multimodal
representations between vision and language. To generalize this success to non-English …

被引用次数：79 相关文章所有 9 个版本

高级搜索

QQ 群