Conversational Speech Synthesis (CSS) aims to express a target utterance with the proper speaking style in a user-agent conversation setting. Existing CSS methods employ effective …
X Zhu, W Tian, X Wang, L He, Y Xiao, X Wang… - Proceedings of the …, 2024 - dl.acm.org
Understanding the speaking style, such as the emotion of the interlocutor's speech, and responding with speech in an appropriate style is a natural occurrence in human …
YA Li, X Jiang, J Darefsky, G Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid advancement of large language models (LLMs) has significantly propelled the development of text-based chatbots, demonstrating their capability to engage in coherent …
Conversational assistants are becoming prevalent among the wider population due to their simplicity and increasing utility. However, the shortcomings of these tools are as renowned …
Q Chen, Y Chen, Y Chen, M Chen, Y Chen… - arXiv preprint arXiv …, 2025 - arxiv.org
Recent advancements in large language models (LLMs) and multimodal speech-text models have laid the groundwork for seamless voice interactions, enabling real-time …
While textless Spoken Language Models (SLMs) have shown potential in end-to-end speech-to-speech modeling, they still lag behind text-based Large Language Models …
As speech becomes an increasingly common modality for interacting with large language models (LLMs), it is becoming desirable to develop systems where LLMs can take into …
Z Jia, R Liu - arXiv preprint arXiv:2412.18733, 2024 - arxiv.org
Conversational Speech Synthesis (CSS) aims to effectively take the multimodal dialogue history (MDH) to generate speech with appropriate conversational prosody for target …
This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech …