DRAFT: A novel framework to reduce domain shifting in self-supervised learning and its application to children's ASR

R Fan, A Alwan - arXiv preprint arXiv:2206.07931, 2022 - arxiv.org
Self-supervised learning (SSL) in the pretraining stage using un-annotated speech data has
been successful in low-resource automatic speech recognition (ASR) tasks. However …

Towards better domain adaptation for self-supervised models: A case study of child ASR

R Fan, Y Zhu, J Wang, A Alwan - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org
Recently, self-supervised learning (SSL) from unlabelled speech data has gained increased
attention in the automatic speech recognition (ASR) community. Typical SSL methods …

Similarity analysis of self-supervised speech representations

YA Chung, Y Belinkov, J Glass - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Self-supervised speech representation learning has recently been a prosperous research
topic. Many algorithms have been proposed for learning useful representations from large …

Probing self-supervised speech models for phonetic and phonemic information: a case study in aspiration

K Martin, J Gauthier, C Breiss, R Levy - arXiv preprint arXiv:2306.06232, 2023 - arxiv.org
Textless self-supervised speech models have grown in capabilities in recent years, but the
nature of the linguistic information they encode has not yet been thoroughly examined. We …

Bi-apc: Bidirectional autoregressive predictive coding for unsupervised pre-training and its application to children's asr

R Fan, A Afshan, A Alwan - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
We present a bidirectional unsupervised model pre-training (UPT) method and apply it to
children's automatic speech recognition (ASR). An obstacle to improving child ASR is the …

[HTML][HTML] Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement

V Ravi, J Wang, J Flint, A Alwan - Computer speech & language, 2024 - Elsevier
Speech signals are valuable biomarkers for assessing an individual's mental health,
including identifying Major Depressive Disorder (MDD) automatically. A frequently used …

Low Resource German ASR with Untranscribed Data Spoken by Non-native Children--INTERSPEECH 2021 Shared Task SPAPL System

J Wang, Y Zhu, R Fan, W Chu, A Alwan - arXiv preprint arXiv:2106.09963, 2021 - arxiv.org
This paper describes the SPAPL system for the INTERSPEECH 2021 Challenge: Shared
Task on Automatic Speech Recognition for Non-Native Children's Speech in German.~ 5 …

Attention-based conditioning methods using variable frame rate for style-robust speaker verification

A Afshan, A Alwan - arXiv preprint arXiv:2206.13680, 2022 - arxiv.org
We propose an approach to extract speaker embeddings that are robust to speaking style
variations in text-independent speaker verification. Typically, speaker embedding extraction …

Generalization of deep acoustic and NLP models for large-scale depression screening

A Harati, T Rutowski, Y Lu, P Chlebek… - Biomedical Sensing and …, 2022 - Springer
Depression is a costly and underdiagnosed global health concern, and there is a great need
for improved patient screening. Speech technology offers promise for remote screening, but …

Learning from human perception to improve automatic speaker verification in style-mismatched conditions

A Afshan, A Alwan - arXiv preprint arXiv:2206.13684, 2022 - arxiv.org
Our prior experiments show that humans and machines seem to employ different
approaches to speaker discrimination, especially in the presence of speaking style …