Wavchat: A survey of spoken dialogue models

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

A survey on speech large language models

J Peng, Y Wang, Y Xi, X Li, K Yu - arXiv preprint arXiv:2410.18908, 2024 - arxiv.org
Large Language Models (LLMs) exhibit strong contextual understanding and remarkable
multi-task performance. Therefore, researchers have been seeking to integrate LLMs in the …

A suite for acoustic language model evaluation

G Maimon, A Roth, Y Adi - arXiv preprint arXiv:2409.07437, 2024 - arxiv.org
Speech language models have recently demonstrated great potential as universal speech
processing systems. Such models have the ability to model the rich acoustic information …

Roadmap towards superhuman speech understanding using large language models

F Bu, Y Zhang, X Wang, B Wang, Q Liu, H Li - arXiv preprint arXiv …, 2024 - arxiv.org
The success of large language models (LLMs) has prompted efforts to integrate speech and
audio data, aiming to create general foundation models capable of processing both textual …

Overview of the Amphion Toolkit (v0. 2)

J Li, X Zhang, Y Wang, H He, C Wang, L Wang… - arXiv preprint arXiv …, 2025 - arxiv.org
Amphion is an open-source toolkit for Audio, Music, and Speech Generation, designed to
lower the entry barrier for junior researchers and engineers in these fields. It provides a …

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

X Cheng, D Fu, X Yang, M Fang, R Hu, J Lu… - arXiv preprint arXiv …, 2025 - arxiv.org
With the rapid development of large language models, researchers have created
increasingly advanced spoken dialogue systems that can naturally converse with humans …

Just ASR+ LLM? A Study on Speech Large Language Models' Ability to Identify And Understand Speaker in Spoken Dialogue

J Wu, X Fan, BR Lu, X Jiang… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
In recent years, we have observed a rapid advancement in speech language models
(SpeechLLMs), catching up with humans' listening and reasoning abilities. SpeechLLMs …

Zh-Paral: Benchmark Dataset for Comprehension of Chinese Paralinguistic Speech

D Chen, Z Wu - 2024 IEEE 14th International Symposium on …, 2024 - ieeexplore.ieee.org
The substantial data present in spoken language is classified into two categories in our
work: background and speaking. The speaking part is then divided into two subcategories …

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators

C Chen, Y Hu, S Wang, H Wang, Z Chen… - arXiv preprint arXiv …, 2025 - arxiv.org
An ideal multimodal agent should be aware of the quality of its input modalities. Recent
advances have enabled large language models (LLMs) to incorporate auditory systems for …