GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models. However, empowering Large …
Z Xie, C Wu - arXiv preprint arXiv:2410.11190, 2024 - arxiv.org
GPT4o, an all-encompassing model, represents a milestone in the development of multi- modal large models. It can understand visual, auditory, and textual modalities, directly output …
While textless Spoken Language Models (SLMs) have shown potential in end-to-end speech-to-speech modeling, they still lag behind text-based Large Language Models …
Building on the success of large language models (LLMs), recent advancements such as GPT-4o have enabled real-time speech interactions through LLM-based voice assistants …
W Xu, S Wang, W Zhao, X Han, Y Yan, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have demonstrated the ability to improve human efficiency through conversational interactions. Conventional LLM-powered dialogue systems …
A Dao, DB Vu, HH Ha - arXiv preprint arXiv:2410.15316, 2024 - arxiv.org
Large Language Models (LLMs) have revolutionized natural language processing, but their application to speech-based tasks remains challenging due to the complexities of …
X Wang, Y Li, C Fu, L Xie, K Li, X Sun, L Ma - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid development of large language models has brought many new smart applications, especially the excellent multimodal human-computer interaction in GPT-4o has brought …
Z Li, J Zhang, D Wang, Y Wang, X Huang, Z Wei - 2024 - preprints.org
With the success of large language models (LLMs) driving progress towards general- purpose AI, there has been a growing focus on extending these models to multi-modal …