Stacked acoustic-and-textual encoding: Integrating the pre-trained models into speech translation...

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：224 相关文章所有 6 个版本

[PDF] arxiv.org

STEMM: Self-learning with speech-text manifold mixup for speech translation

Q Fang, R Ye, L Li, Y Feng, M Wang - arXiv preprint arXiv:2203.10426, 2022 - arxiv.org

How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …

被引用次数：98 相关文章所有 8 个版本

[PDF] arxiv.org

Recent advances in direct speech-to-text translation

C Xu, R Ye, Q Dong, C Zhao, T Ko, M Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, speech-to-text translation has attracted more and more attention and many studies
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …

被引用次数：19 相关文章所有 4 个版本

[PDF] arxiv.org

Cross-modal contrastive learning for speech translation

R Ye, M Wang, L Li - arXiv preprint arXiv:2205.02444, 2022 - arxiv.org

How can we learn unified representations for spoken utterances and their written text?
Learning similar representations for semantically similar speech and text is important for …

被引用次数：84 相关文章所有 9 个版本

[PDF] arxiv.org

Speechut: Bridging speech and text with hidden-unit for encoder-decoder based speech-text pre-training

Z Zhang, L Zhou, J Ao, S Liu, L Dai, J Li… - arXiv preprint arXiv …, 2022 - arxiv.org

The rapid development of single-modal pre-training has prompted researchers to pay more
attention to cross-modal pre-training methods. In this paper, we propose a unified-modal …

被引用次数：56 相关文章所有 3 个版本

[PDF] arxiv.org

M³ST: Mix at Three Levels for Speech Translation

X Cheng, Q Dong, F Yue, T Ko… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

How to solve the data scarcity problem for end-to-end speech-to-text translation (ST)? It's
well known that data augmentation is an efficient method to improve performance for many …

被引用次数：52 相关文章所有 3 个版本

[PDF] mlr.press

Revisiting end-to-end speech-to-text translation from scratch

B Zhang, B Haddow… - … conference on machine …, 2022 - proceedings.mlr.press

Abstract End-to-end (E2E) speech-to-text translation (ST) often depends on pretraining its
encoder and/or decoder using source transcripts via speech recognition or text translation …

被引用次数：33 相关文章所有 7 个版本

[PDF] arxiv.org

CMOT: Cross-modal mixup via optimal transport for speech translation

Y Zhou, Q Fang, Y Feng - arXiv preprint arXiv:2305.14635, 2023 - arxiv.org

End-to-end speech translation (ST) is the task of translating speech signals in the source
language into text in the target language. As a cross-modal task, end-to-end ST is difficult to …

被引用次数：25 相关文章所有 5 个版本

[PDF] mlr.press

Pre-training for speech translation: Ctc meets optimal transport

PH Le, H Gong, C Wang, J Pino… - International …, 2023 - proceedings.mlr.press

The gap between speech and text modalities is a major challenge in speech-to-text
translation (ST). Different methods have been proposed to reduce this gap, but most of them …

被引用次数：25 相关文章所有 12 个版本

[PDF] neurips.cc

Comsl: A composite speech-language model for end-to-end speech-to-text translation

C Le, Y Qian, L Zhou, S Liu, Y Qian… - Advances in Neural …, 2024 - proceedings.neurips.cc

Joint speech-language training is challenging due to the large demand for training data and
GPU consumption, as well as the modality gap between speech and language. We present …

被引用次数：10 相关文章所有 6 个版本

高级搜索

QQ 群