Must-c: a multilingual speech translation corpus

MA Di Gangi, R Cattoni, L Bentivogli, M Negri… - Proceedings of the …, 2019 - cris.fbk.eu
Current research on spoken language translation (SLT) has to confront with the scarcity of
sizeable and publicly available training corpora. This problem hinders the adoption of neural …

Must-c: A multilingual corpus for end-to-end speech translation

R Cattoni, MA Di Gangi, L Bentivogli, M Negri… - Computer speech & …, 2021 - Elsevier
End-to-end spoken language translation (SLT) has recently gained popularity thanks to the
advancement of sequence to sequence learning in its two parent tasks: automatic speech …

Towards automatic face-to-face translation

P KR, R Mukhopadhyay, J Philip, A Jha… - Proceedings of the 27th …, 2019 - dl.acm.org
In light of the recent breakthroughs in automatic machine translation systems, we propose a
novel approach that we term as" Face-to-Face Translation". As today's digital communication …

Multimodal machine translation through visuals and speech

U Sulubacak, O Caglayan, SA Grönroos, A Rouhe… - Machine …, 2020 - Springer
Multimodal machine translation involves drawing information from more than one modality,
based on the assumption that the additional modalities will contain useful alternative views …

Augmenting librispeech with french translations: A multimodal corpus for direct speech translation evaluation

AC Kocabiyikoglu, L Besacier, O Kraif - arXiv preprint arXiv:1802.03142, 2018 - arxiv.org
Recent works in spoken language translation (SLT) have attempted to build end-to-end
speech-to-text translation without using source language transcription during learning or …

End-to-End Speech-to-Text Translation: A Survey

N Sethiya, CK Maurya - arXiv preprint arXiv:2312.01053, 2023 - arxiv.org
Speech-to-text translation pertains to the task of converting speech signals in a language to
text in another language. It finds its application in various domains, such as hands-free …

Large-scale streaming end-to-end speech translation with neural transducers

J Xue, P Wang, J Li, M Post, Y Gaur - arXiv preprint arXiv:2204.05352, 2022 - arxiv.org
Neural transducers have been widely used in automatic speech recognition (ASR). In this
paper, we introduce it to streaming end-to-end speech translation (ST), which aims to …

Mass: A large and clean multilingual corpus of sentence-aligned spoken utterances extracted from the bible

MZ Boito, WN Havard, M Garnerin, ÉL Ferrand… - arXiv preprint arXiv …, 2019 - arxiv.org
The CMU Wilderness Multilingual Speech Dataset (Black, 2019) is a newly published
multilingual speech dataset based on recorded readings of the New Testament. It provides …

Bstc: A large-scale chinese-english speech translation dataset

R Zhang, X Wang, C Zhang, Z He, H Wu, Z Li… - arXiv preprint arXiv …, 2021 - arxiv.org
This paper presents BSTC (Baidu Speech Translation Corpus), a large-scale Chinese-
English speech translation dataset. This dataset is constructed based on a collection of …

Blsp: Bootstrapping language-speech pre-training via behavior alignment of continuation writing

C Wang, M Liao, Z Huang, J Lu, J Wu, Y Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
The emergence of large language models (LLMs) has sparked significant interest in
extending their remarkable language capabilities to speech. However, modality alignment …