J Peng, Y Wang, Y Xi, X Li, K Yu - arXiv preprint arXiv:2410.18908, 2024 - arxiv.org
Large Language Models (LLMs) exhibit strong contextual understanding and remarkable multi-task performance. Therefore, researchers have been seeking to integrate LLMs in the …
Speech language models have recently demonstrated great potential as universal speech processing systems. Such models have the ability to model the rich acoustic information …
F Bu, Y Zhang, X Wang, B Wang, Q Liu, H Li - arXiv preprint arXiv …, 2024 - arxiv.org
The success of large language models (LLMs) has prompted efforts to integrate speech and audio data, aiming to create general foundation models capable of processing both textual …
Amphion is an open-source toolkit for Audio, Music, and Speech Generation, designed to lower the entry barrier for junior researchers and engineers in these fields. It provides a …
X Cheng, D Fu, X Yang, M Fang, R Hu, J Lu… - arXiv preprint arXiv …, 2025 - arxiv.org
With the rapid development of large language models, researchers have created increasingly advanced spoken dialogue systems that can naturally converse with humans …
In recent years, we have observed a rapid advancement in speech language models (SpeechLLMs), catching up with humans' listening and reasoning abilities. SpeechLLMs …
D Chen, Z Wu - 2024 IEEE 14th International Symposium on …, 2024 - ieeexplore.ieee.org
The substantial data present in spoken language is classified into two categories in our work: background and speaking. The speaking part is then divided into two subcategories …
C Chen, Y Hu, S Wang, H Wang, Z Chen… - arXiv preprint arXiv …, 2025 - arxiv.org
An ideal multimodal agent should be aware of the quality of its input modalities. Recent advances have enabled large language models (LLMs) to incorporate auditory systems for …