Computer-assisted pronunciation training: From pronunciation scoring towards spoken language learning

NF Chen, H Li - 2016 Asia-Pacific Signal and Information …, 2016 - ieeexplore.ieee.org
This paper reviews the research approaches used in computer-assisted pronunciation
training (CAPT), addresses the existing challenges, and discusses emerging trends and …

Synthetic speech detection using fundamental frequency variation and spectral features

M Pal, D Paul, G Saha - Computer Speech & Language, 2018 - Elsevier
Recent works on the vulnerability of automatic speaker verification (ASV) systems confirm
that malicious spoofing attacks using synthetic speech can provoke significant increase in …

[PDF][PDF] The fundamental frequency variation spectrum

K Laskowski, M Heldner, J Edlund - Proceedings of FONETIK, 2008 - Citeseer
This paper describes a recently introduced vector-valued representation of fundamental
frequency variation–the FFV spectrum–which has a number of desirable properties. In …

Systems and methods for identifying human emotions and/or mental health states based on analyses of audio inputs and/or behavioral data collected from computing …

J Feast, A Azarbayejani, S Place - US Patent 10,276,188, 2019 - Google Patents
Systems and methods are provided for analyzing voice based audio inputs. A voice-based
audio input associated with a user (eg, wherein the voice-based audio input is a prompt or a …

[PDF][PDF] Modeling phrasing and prominence using deep recurrent learning.

A Rosenberg, R Fernandez, B Ramabhadran - Interspeech, 2015 - isca-archive.org
Abstract Models for the prediction of prosodic events, such as pitch accents and phrasal
boundaries, often rely on machine learning models that combine a set of input features …

[PDF][PDF] A whispered Mandarin corpus for speech technology applications.

PX Lee, D Wee, HSY Toh, BP Lim, NF Chen… - …, 2014 - isca-archive.org
Whispered speech is a natural mode of speech in which voicing is absent–its acoustics differ
significantly from normally spoken speech or so-called neutral speech, such that it is …

An Effective Hierarchical Graph Attention Network Modeling Approach for Pronunciation Assessment

BC Yan, B Chen - IEEE/ACM Transactions on Audio, Speech …, 2024 - ieeexplore.ieee.org
Automatic pronunciation assessment (APA) manages to quantify second language (L2)
learners' pronunciation proficiency in a target language by providing fine-grained feedback …

[PDF][PDF] Overview of front-end features for robust speaker recognition

Q Jin, TF Zheng - Proc. APSIPA, 2011 - apsipa.org
This paper provides an overview of automatic speaker recognition technologies, with an
emphasis on front-end features for robust speaker recognition. We categorize the frontend …

Very short utterances in conversation

J Edlund, M Heldner, S Al Moubayed… - Working papers/Lund …, 2010 - journals.lub.lu.se
Faced with the difficulties of finding an operationalized definition of backchannels, we have
previously proposed an intermediate, auxiliary unit–the very short utterance (VSU)–which is …

Voice-transformation-based data augmentation for prosodic classification

R Fernandez, A Rosenberg, A Sorin… - … , Speech and Signal …, 2017 - ieeexplore.ieee.org
In this work we explore data-augmentation techniques for the task of improving the
performance of a supervised recurrent-neural-network classifier tasked with predicting …