Q Fang, R Ye, L Li, Y Feng, M Wang - arXiv preprint arXiv:2203.10426, 2022 - arxiv.org
How to learn a better speech representation for end-to-end speech-to-text translation (ST) with limited labeled data? Existing techniques often attempt to transfer powerful machine …
Recently, speech-to-text translation has attracted more and more attention and many studies have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …
R Ye, M Wang, L Li - arXiv preprint arXiv:2205.02444, 2022 - arxiv.org
How can we learn unified representations for spoken utterances and their written text? Learning similar representations for semantically similar speech and text is important for …
The rapid development of single-modal pre-training has prompted researchers to pay more attention to cross-modal pre-training methods. In this paper, we propose a unified-modal …
X Cheng, Q Dong, F Yue, T Ko… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
How to solve the data scarcity problem for end-to-end speech-to-text translation (ST)? It's well known that data augmentation is an efficient method to improve performance for many …
B Zhang, B Haddow… - … conference on machine …, 2022 - proceedings.mlr.press
Abstract End-to-end (E2E) speech-to-text translation (ST) often depends on pretraining its encoder and/or decoder using source transcripts via speech recognition or text translation …
Y Zhou, Q Fang, Y Feng - arXiv preprint arXiv:2305.14635, 2023 - arxiv.org
End-to-end speech translation (ST) is the task of translating speech signals in the source language into text in the target language. As a cross-modal task, end-to-end ST is difficult to …
The gap between speech and text modalities is a major challenge in speech-to-text translation (ST). Different methods have been proposed to reduce this gap, but most of them …
Joint speech-language training is challenging due to the large demand for training data and GPU consumption, as well as the modality gap between speech and language. We present …