STEMM: Self-learning with speech-text manifold mixup for speech translation

C Xu, R Ye, Q Dong, C Zhao, T Ko, M Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, speech-to-text translation has attracted more and more attention and many studies
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …

被引用次数：13 相关文章所有 4 个版本

[PDF] arxiv.org

Cross-modal contrastive learning for speech translation

R Ye, M Wang, L Li - arXiv preprint arXiv:2205.02444, 2022 - arxiv.org

How can we learn unified representations for spoken utterances and their written text?
Learning similar representations for semantically similar speech and text is important for …

被引用次数：77 相关文章所有 9 个版本

[PDF] arxiv.org

Speechut: Bridging speech and text with hidden-unit for encoder-decoder based speech-text pre-training

Z Zhang, L Zhou, J Ao, S Liu, L Dai, J Li… - arXiv preprint arXiv …, 2022 - arxiv.org

The rapid development of single-modal pre-training has prompted researchers to pay more
attention to cross-modal pre-training methods. In this paper, we propose a unified-modal …

被引用次数：52 相关文章所有 3 个版本

[PDF] arxiv.org

End-to-end speech-to-text translation: A survey

N Sethiya, CK Maurya - Computer Speech & Language, 2024 - Elsevier

Abstract Speech-to-Text (ST) translation pertains to the task of converting speech signals in
one language to text in another language. It finds its application in various domains, such as …

被引用次数：4 相关文章所有 2 个版本

[PDF] neurips.cc

Daspeech: Directed acyclic transformer for fast and high-quality speech-to-speech translation

Q Fang, Y Zhou, Y Feng - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Direct speech-to-speech translation (S2ST) translates speech from one language into
another using a single model. However, due to the presence of linguistic and acoustic …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

M³ST: Mix at Three Levels for Speech Translation

X Cheng, Q Dong, F Yue, T Ko… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

How to solve the data scarcity problem for end-to-end speech-to-text translation (ST)? It's
well known that data augmentation is an efficient method to improve performance for many …

被引用次数：51 相关文章所有 3 个版本

[PDF] thecvf.com

Mixspeech: Cross-modality self-learning with audio-visual stream mixup for visual speech translation and recognition

X Cheng, T Jin, R Huang, L Li, W Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Multi-media communications facilitate global interaction among people. However, despite
researchers exploring cross-lingual translation techniques such as machine translation and …

被引用次数：22 相关文章所有 6 个版本

[PDF] arxiv.org

CMOT: Cross-modal mixup via optimal transport for speech translation

Y Zhou, Q Fang, Y Feng - arXiv preprint arXiv:2305.14635, 2023 - arxiv.org

End-to-end speech translation (ST) is the task of translating speech signals in the source
language into text in the target language. As a cross-modal task, end-to-end ST is difficult to …

被引用次数：23 相关文章所有 5 个版本

[PDF] arxiv.org

Dub: Discrete unit back-translation for speech translation

D Zhang, R Ye, T Ko, M Wang, Y Zhou - arXiv preprint arXiv:2305.11411, 2023 - arxiv.org

How can speech-to-text translation (ST) perform as well as machine translation (MT)? The
key point is to bridge the modality gap between speech and text so that useful MT …

被引用次数：22 相关文章所有 5 个版本

[PDF] arxiv.org

Neural machine translation with phrase-level universal visual representations

Q Fang, Y Feng - arXiv preprint arXiv:2203.10299, 2022 - arxiv.org

Multimodal machine translation (MMT) aims to improve neural machine translation (NMT)
with additional visual information, but most existing MMT methods require paired input of …

被引用次数：36 相关文章所有 4 个版本

高级搜索

QQ 群