SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

A survey on speech large language models

J Peng, Y Wang, Y Xi, X Li, K Yu - arXiv preprint arXiv:2410.18908, 2024 - arxiv.org

Large Language Models (LLMs) exhibit strong contextual understanding and remarkable
multi-task performance. Therefore, researchers have been seeking to integrate LLMs in the …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

A suite for acoustic language model evaluation

G Maimon, A Roth, Y Adi - arXiv preprint arXiv:2409.07437, 2024 - arxiv.org

Speech language models have recently demonstrated great potential as universal speech
processing systems. Such models have the ability to model the rich acoustic information …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Roadmap towards superhuman speech understanding using large language models

F Bu, Y Zhang, X Wang, B Wang, Q Liu, H Li - arXiv preprint arXiv …, 2024 - arxiv.org

The success of large language models (LLMs) has prompted efforts to integrate speech and
audio data, aiming to create general foundation models capable of processing both textual …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Overview of the Amphion Toolkit (v0. 2)

J Li, X Zhang, Y Wang, H He, C Wang, L Wang… - arXiv preprint arXiv …, 2025 - arxiv.org

Amphion is an open-source toolkit for Audio, Music, and Speech Generation, designed to
lower the entry barrier for junior researchers and engineers in these fields. It provides a …

[PDF] arxiv.org

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators

C Chen, Y Hu, S Wang, H Wang, Z Chen… - arXiv preprint arXiv …, 2025 - arxiv.org

An ideal multimodal agent should be aware of the quality of its input modalities. Recent
advances have enabled large language models (LLMs) to incorporate auditory systems for …

高级搜索

QQ 群