JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions

D Xin, J Jiang, S Takamichi, Y Saito, A Aizawa… - IEEE …, 2024 - ieeexplore.ieee.org
We present the JVNV, a Japanese emotional speech corpus with verbal content and
nonverbal vocalizations whose scripts are generated by a large-scale language model …

Emotion Expression Estimates to Measure and Improve Multimodal Social-Affective Interactions

JA Brooks, V Tiruvadi, A Baird, P Tzirakis, H Li… - … Publication of the 25th …, 2023 - dl.acm.org
Large language models (LLMs) are being adopted in a wide range of applications, but an
understanding of other social-affective signals is needed to support effective human …

Whisper in Focus: Enhancing Stuttered Speech Classification with Encoder Layer Optimization

H Ameer, S Latif, R Latif, S Mukhtar - arXiv preprint arXiv:2311.05203, 2023 - arxiv.org
In recent years, advancements in the field of speech processing have led to cutting-edge
deep learning algorithms with immense potential for real-world applications. The automated …

Sigh!!! There is more than just faces and verbal speech to recognize emotion in human-robot interaction

RS Maharjan, M Romeo… - 2024 33rd IEEE …, 2024 - ieeexplore.ieee.org
Understanding human emotions is paramount for effective human-human interactions. As
technology advances, social robots are increasingly being developed with the capability to …

Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition

OC Phukan, MM Akhtar, SR Behera, S Kalita… - arXiv preprint arXiv …, 2024 - arxiv.org
In this study, we investigate multimodal foundation models (MFMs) for emotion recognition
from non-verbal sounds. We hypothesize that MFMs, with their joint pre-training across …

Maia: A Real-time Non-Verbal Chat for Human-AI Interaction

D Costea, A Marcu, C Lazar, M Leordeanu - arXiv preprint arXiv …, 2024 - arxiv.org
Face-to-face communication modeling in computer vision is an area of research focusing on
developing algorithms that can recognize and analyze non-verbal cues and behaviors …

Unsupervised pronunciation assessment analysis using utterance level alignment distance with self-supervised representations

N Anand, M Sirigiraju, C Yarra - 2023 IEEE 20th India Council …, 2023 - ieeexplore.ieee.org
The pronunciation quality of second language (L2) learners can be affected by different
factors including the following seven factors: Intelligibility, Intonation, Phoneme quality …

Unsupervised speech intelligibility assessment with utterance level alignment distance between teacher and learner Wav2Vec-2.0 representations

N Anand, M Sirigiraju, C Yarra - arXiv preprint arXiv:2306.08845, 2023 - arxiv.org
Speech intelligibility is crucial in language learning for effective communication. Thus, to
develop computer-assisted language learning systems, automatic speech intelligibility …

[PDF][PDF] Unsupervised spoken content mismatch detection for automatic data validation under Indian context for building HCI systems

N Anand - 2024 - cdn.iiit.ac.in
This thesis explores the critical challenges and provides solutions associated with automatic
spoken data validation in the complex multilingual and multicultural context of India, which is …