Seamless: Multilingual Expressive and Streaming Speech Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arXiv preprint arXiv …, 2023 - arxiv.org
Large-scale automatic speech translation systems today lack key features that help machine-
mediated communication feel seamless when compared to human-to-human dialogue. In …

End-to-end speech-to-text translation: A survey

N Sethiya, CK Maurya - Computer Speech & Language, 2024 - Elsevier
Abstract Speech-to-Text (ST) translation pertains to the task of converting speech signals in
one language to text in another language. It finds its application in various domains, such as …

Findings of the iwslt 2023 evaluation campaign

M Agarwal, S Agarwal, A Anastasopoulos, L Bentivogli… - 2023 - um.edu.mt
This paper reports on the shared tasks organized by the 20th IWSLT Conference. The
shared tasks address 9 scientific challenges in spoken language translation: simultaneous …

Attention as a guide for simultaneous speech translation

S Papi, M Negri, M Turchi - arXiv preprint arXiv:2212.07850, 2022 - arxiv.org
The study of the attention mechanism has sparked interest in many fields, such as language
modeling and machine translation. Although its patterns have been exploited to perform …

Transllama: Llm-based simultaneous translation system

R Koshkin, K Sudoh, S Nakamura - arXiv preprint arXiv:2402.04636, 2024 - arxiv.org
Decoder-only large language models (LLMs) have recently demonstrated impressive
capabilities in text generation and reasoning. Nonetheless, they have limited applications in …

Hybrid transducer and attention based encoder-decoder modeling for speech-to-text tasks

Y Tang, AY Sun, H Inaguma, X Chen, N Dong… - arXiv preprint arXiv …, 2023 - arxiv.org
Transducer and Attention based Encoder-Decoder (AED) are two widely used frameworks
for speech-to-text tasks. They are designed for different purposes and each has its own …

Token-level serialized output training for joint streaming asr and st leveraging textual alignments

S Papi, P Wang, J Chen, J Xue, J Li… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
In real-world applications, users often require both translations and transcriptions of speech
to enhance their comprehension, particularly in streaming scenarios where incremental …

Simultaneous machine translation with large language models

M Wang, J Zhao, TT Vu, F Shiri, E Shareghi… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLM) have demonstrated their abilities to solve various natural
language processing tasks through dialogue-based interactions. For instance, research …

Alignatt: Using attention-based audio-translation alignments as a guide for simultaneous speech translation

S Papi, M Turchi, M Negri - arXiv preprint arXiv:2305.11408, 2023 - arxiv.org
Attention is the core mechanism of today's most used architectures for natural language
processing and has been analyzed from many perspectives, including its effectiveness for …

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

P Polák, B Yan, S Watanabe, A Waibel… - arXiv preprint arXiv …, 2023 - arxiv.org
Blockwise self-attentional encoder models have recently emerged as one promising end-to-
end approach to simultaneous speech translation. These models employ a blockwise beam …