When LLMs meets acoustic landmarks: An efficient approach to integrate speech into large language models for depression detection

X Zhang, H Liu, K Xu, Q Zhang, D Liu, B Ahmed… - arXiv preprint arXiv …, 2024 - arxiv.org
arXiv preprint arXiv:2402.13276, 2024arxiv.org
Depression is a critical concern in global mental health, prompting extensive research into
AI-based detection methods. Among various AI technologies, Large Language Models
(LLMs) stand out for their versatility in mental healthcare applications. However, their
primary limitation arises from their exclusive dependence on textual input, which constrains
their overall capabilities. Furthermore, the utilization of LLMs in identifying and analyzing
depressive states is still relatively untapped. In this paper, we present an innovative …
Depression is a critical concern in global mental health, prompting extensive research into AI-based detection methods. Among various AI technologies, Large Language Models (LLMs) stand out for their versatility in mental healthcare applications. However, their primary limitation arises from their exclusive dependence on textual input, which constrains their overall capabilities. Furthermore, the utilization of LLMs in identifying and analyzing depressive states is still relatively untapped. In this paper, we present an innovative approach to integrating acoustic speech information into the LLMs framework for multimodal depression detection. We investigate an efficient method for depression detection by integrating speech signals into LLMs utilizing Acoustic Landmarks. By incorporating acoustic landmarks, which are specific to the pronunciation of spoken words, our method adds critical dimensions to text transcripts. This integration also provides insights into the unique speech patterns of individuals, revealing the potential mental states of individuals. Evaluations of the proposed approach on the DAIC-WOZ dataset reveal state-of-the-art results when compared with existing Audio-Text baselines. In addition, this approach is not only valuable for the detection of depression but also represents a new perspective in enhancing the ability of LLMs to comprehend and process speech signals.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果

Google学术搜索按钮

example.edu/paper.pdf
搜索
获取 PDF 文件
引用
References