Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation

H Tak, M Todisco, X Wang, J Jung, J Yamagishi… - arXiv preprint arXiv …, 2022 - arxiv.org
The performance of spoofing countermeasure systems depends fundamentally upon the use
of sufficiently representative training data. With this usually being limited, current solutions …

Exploring wav2vec 2.0 fine tuning for improved speech emotion recognition

LW Chen, A Rudnicky - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
While Wav2Vec 2.0 has been proposed for speech recognition (ASR), it can also be used for
speech emotion recognition (SER); its performance can be significantly improved using …

Automatic pronunciation assessment using self-supervised speech representation learning

E Kim, JJ Jeon, H Seo, H Kim - arXiv preprint arXiv:2204.03863, 2022 - arxiv.org
Self-supervised learning (SSL) approaches such as wav2vec 2.0 and HuBERT models have
shown promising results in various downstream tasks in the speech community. In particular …

[HTML][HTML] Computer-assisted pronunciation training—Speech synthesis is almost all you need

D Korzekwa, J Lorenzo-Trueba, T Drugman… - Speech …, 2022 - Elsevier
The research community has long studied computer-assisted pronunciation training (CAPT)
methods in non-native speech. Researchers focused on studying various model …

Automatic Pronunciation Assessment--A Review

YE Kheir, A Ali, SA Chowdhury - arXiv preprint arXiv:2310.13974, 2023 - arxiv.org
Pronunciation assessment and its application in computer-aided pronunciation training
(CAPT) have seen impressive progress in recent years. With the rapid growth in language …

3m: An effective multi-view, multi-granularity, and multi-aspect modeling approach to english pronunciation assessment

FA Chao, TH Lo, TI Wu, YT Sung… - 2022 Asia-Pacific Signal …, 2022 - ieeexplore.ieee.org
As an indispensable ingredient of computer-assisted pronunciation training (CAPT),
automatic pronunciation assessment (APA) plays a pivotal role in aiding self-directed …

Proficiency assessment of L2 spoken English using wav2vec 2.0

S Bannò, M Matassoni - 2022 IEEE Spoken Language …, 2023 - ieeexplore.ieee.org
The increasing demand for learning English as a second language has led to a growing
interest in methods for automatically assessing spoken language proficiency. Most …

Improving mispronunciation detection with wav2vec2-based momentum pseudo-labeling for accentedness and intelligibility assessment

M Yang, K Hirschi, SD Looney, O Kang… - arXiv preprint arXiv …, 2022 - arxiv.org
Current leading mispronunciation detection and diagnosis (MDD) systems achieve
promising performance via end-to-end phoneme recognition. One challenge of such end-to …

[HTML][HTML] Audio anti-spoofing based on audio feature fusion

J Zhang, G Tu, S Liu, Z Cai - Algorithms, 2023 - mdpi.com
The rapid development of speech synthesis technology has significantly improved the
naturalness and human-likeness of synthetic speech. As the technical barriers for speech …

Self-supervised pre-trained speech representation based end-to-end mispronunciation detection and diagnosis of Mandarin

Y Shen, Q Liu, Z Fan, J Liu, A Wumaier - IEEE Access, 2022 - ieeexplore.ieee.org
Mispronunciation Detection and Diagnosis (MDD) is an essential basic technology in
Computer-Assisted Pronunciation Training (CAPT) and Computer-Assisted Language …