StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

Q Fang, S Guo, Y Zhou, Z Ma, S Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Models like GPT-4o enable real-time interaction with large language models (LLMs) through
speech, significantly enhancing user experience compared to traditional text-based …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?

Q Fang, S Zhang, Z Ma, M Zhang, Y Feng - arXiv preprint arXiv …, 2024 - arxiv.org

Recently proposed two-pass direct speech-to-speech translation (S2ST) models decompose
the task into speech-to-text translation (S2TT) and text-to-speech (TTS) within an end-to-end …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

CTC-based Non-autoregressive Textless Speech-to-Speech Translation

Q Fang, Z Ma, Y Zhou, M Zhang, Y Feng - arXiv preprint arXiv:2406.07330, 2024 - arxiv.org

Direct speech-to-speech translation (S2ST) has achieved impressive translation quality, but
it often faces the challenge of slow decoding due to the considerable length of speech …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Towards achieving human parity on end-to-end simultaneous speech translation via llm agent

S Cheng, Z Huang, T Ko, H Li, N Peng, L Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we present Cross Language Agent--Simultaneous Interpretation, CLASI, a
high-quality and human-like Simultaneous Speech Translation (SiST) System. Inspired by …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Decoder-only Streaming Transformer for Simultaneous Translation

S Guo, S Zhang, Y Feng - arXiv preprint arXiv:2406.03878, 2024 - arxiv.org

Simultaneous Machine Translation (SiMT) generates translation while reading source
tokens, essentially producing the target prefix based on the source prefix. To achieve good …

Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens

J Zhao, N Moritz, E Lakomkin, R Xie, Z Xiu… - arXiv preprint arXiv …, 2024 - arxiv.org

Cascaded speech-to-speech translation systems often suffer from the error accumulation
problem and high latency, which is a result of cascaded modules whose inference delays …

高级搜索

QQ 群