Z Kong, A Goel, R Badlani, W Ping, R Valle… - arXiv e …, 2024 - ui.adsabs.harvard.edu
Augmenting large language models (LLMs) to understand audio--including non-speech
sounds and non-verbal speech--is critically important for diverse real-world applications of …