Multi-channel spectrograms for speech processing applications using deep learning methods

T Arias-Vergara, P Klumpp, JC Vasquez-Correa… - Pattern Analysis and …, 2021 - Springer
Time–frequency representations of the speech signals provide dynamic information about
how the frequency component changes with time. In order to process this information, deep …

Cognitive determinants of dysarthria in Parkinson's disease: an automated machine learning approach

AM García, T Arias‐Vergara… - Movement …, 2021 - Wiley Online Library
ABSTRACT Background Dysarthric symptoms in Parkinson's disease (PD) vary greatly
across cohorts. Abundant research suggests that such heterogeneity could reflect subject …

Multimodal end-to-end sparse model for emotion recognition

W Dai, S Cahyawijaya, Z Liu, P Fung - arXiv preprint arXiv:2103.09666, 2021 - arxiv.org
Existing works on multimodal affective computing tasks, such as emotion recognition,
generally adopt a two-phase pipeline, first extracting feature representations for each single …

An overview of Indian spoken language recognition from machine learning perspective

S Dey, M Sahidullah, G Saha - ACM Transactions on Asian and Low …, 2022 - dl.acm.org
Automatic spoken language identification (LID) is a very important research field in the era of
multilingual voice-command-based human-computer interaction. A front-end LID module …

Apkinson: the Smartphone Application for Telemonitoring Parkinson's Patients Through speech, Gait and Hands Movement

JR Orozco-Arroyave, JC Vásquez-Correa… - Neurodegenerative …, 2020 - Taylor & Francis
Aim: This paper introduces Apkinson, a mobile application for motor evaluation and
monitoring of Parkinson's disease patients. Materials & methods: The App is based on …

Dysfluency classification in stuttered speech using deep learning for real-time applications

M Jouaiti, K Dautenhahn - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Stuttering detection and classification are important issues in speech therapy as they could
help therapists track the progression of patients' dysfluencies. This is also an important tool …

Lightweight deep learning model for assessment of substitution voicing and speech after laryngeal carcinoma surgery

R Maskeliūnas, A Kulikajevas, R Damaševičius… - Cancers, 2022 - mdpi.com
Simple Summary A total laryngectomy involves the full and permanent separation of the
upper and lower airways, resulting in the loss of voice and inability to interact vocally. To …

Towards zero-shot learning for automatic phonemic transcription

X Li, S Dalmia, D Mortensen, J Li, A Black… - Proceedings of the AAAI …, 2020 - aaai.org
Automatic phonemic transcription tools are useful for low-resource language documentation.
However, due to the lack of training sets, only a tiny fraction of languages have phonemic …

Multimodal interaction enhanced representation learning for video emotion recognition

X Xia, Y Zhao, D Jiang - Frontiers in Neuroscience, 2022 - frontiersin.org
Video emotion recognition aims to infer human emotional states from the audio, visual, and
text modalities. Previous approaches are centered around designing sophisticated fusion …

Automatic assessment of speech intelligibility using consonant similarity for head and neck cancer

S Quintas, J Mauclair, V Woisard… - … Conference: Human and …, 2022 - hal.science
The automatic prediction of speech intelligibility is a widely known problem in the context of
pathological speech. It has been seen as a growing and viable alternative to perceptual …