Over-generation cannot be rewarded: Length-adaptive average lagging for simultaneous speech...

L Barrault, YA Chung, MC Meglioli, D Dale… - arXiv preprint arXiv …, 2023 - arxiv.org

Large-scale automatic speech translation systems today lack key features that help machine-
mediated communication feel seamless when compared to human-to-human dialogue. In …

被引用次数：80 相关文章

[PDF] arxiv.org

End-to-end speech-to-text translation: A survey

N Sethiya, CK Maurya - Computer Speech & Language, 2024 - Elsevier

Abstract Speech-to-Text (ST) translation pertains to the task of converting speech signals in
one language to text in another language. It finds its application in various domains, such as …

被引用次数：4 相关文章所有 2 个版本

[PDF] um.edu.mt

Findings of the iwslt 2023 evaluation campaign

M Agarwal, S Agarwal, A Anastasopoulos, L Bentivogli… - 2023 - um.edu.mt

This paper reports on the shared tasks organized by the 20th IWSLT Conference. The
shared tasks address 9 scientific challenges in spoken language translation: simultaneous …

被引用次数：46 相关文章所有 10 个版本

[PDF] arxiv.org

Attention as a guide for simultaneous speech translation

S Papi, M Negri, M Turchi - arXiv preprint arXiv:2212.07850, 2022 - arxiv.org

The study of the attention mechanism has sparked interest in many fields, such as language
modeling and machine translation. Although its patterns have been exploited to perform …

被引用次数：18 相关文章所有 6 个版本

[PDF] arxiv.org

Transllama: Llm-based simultaneous translation system

R Koshkin, K Sudoh, S Nakamura - arXiv preprint arXiv:2402.04636, 2024 - arxiv.org

Decoder-only large language models (LLMs) have recently demonstrated impressive
capabilities in text generation and reasoning. Nonetheless, they have limited applications in …

被引用次数：11 相关文章所有 3 个版本

[PDF] arxiv.org

Hybrid transducer and attention based encoder-decoder modeling for speech-to-text tasks

Y Tang, AY Sun, H Inaguma, X Chen, N Dong… - arXiv preprint arXiv …, 2023 - arxiv.org

Transducer and Attention based Encoder-Decoder (AED) are two widely used frameworks
for speech-to-text tasks. They are designed for different purposes and each has its own …

被引用次数：12 相关文章所有 5 个版本

[PDF] arxiv.org

Token-level serialized output training for joint streaming asr and st leveraging textual alignments

S Papi, P Wang, J Chen, J Xue, J Li… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

In real-world applications, users often require both translations and transcriptions of speech
to enhance their comprehension, particularly in streaming scenarios where incremental …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

Simultaneous machine translation with large language models

M Wang, J Zhao, TT Vu, F Shiri, E Shareghi… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLM) have demonstrated their abilities to solve various natural
language processing tasks through dialogue-based interactions. For instance, research …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Alignatt: Using attention-based audio-translation alignments as a guide for simultaneous speech translation

S Papi, M Turchi, M Negri - arXiv preprint arXiv:2305.11408, 2023 - arxiv.org

Attention is the core mechanism of today's most used architectures for natural language
processing and has been analyzed from many perspectives, including its effectiveness for …

被引用次数：8 相关文章所有 8 个版本

[PDF] arxiv.org

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

P Polák, B Yan, S Watanabe, A Waibel… - arXiv preprint arXiv …, 2023 - arxiv.org

Blockwise self-attentional encoder models have recently emerged as one promising end-to-
end approach to simultaneous speech translation. These models employ a blockwise beam …

被引用次数：4 相关文章所有 6 个版本

高级搜索

QQ 群