An instantaneous vector representation of delta pitch for speaker-change prediction in conversati...

Computer-assisted pronunciation training: From pronunciation scoring towards spoken language learning

NF Chen, H Li - 2016 Asia-Pacific Signal and Information …, 2016 - ieeexplore.ieee.org

This paper reviews the research approaches used in computer-assisted pronunciation
training (CAPT), addresses the existing challenges, and discusses emerging trends and …

被引用次数：56 相关文章所有 2 个版本

Synthetic speech detection using fundamental frequency variation and spectral features

M Pal, D Paul, G Saha - Computer Speech & Language, 2018 - Elsevier

Recent works on the vulnerability of automatic speaker verification (ASV) systems confirm
that malicious spoofing attacks using synthetic speech can provoke significant increase in …

被引用次数：60 相关文章所有 2 个版本

[PDF] psu.edu

[PDF][PDF] The fundamental frequency variation spectrum

K Laskowski, M Heldner, J Edlund - Proceedings of FONETIK, 2008 - Citeseer

This paper describes a recently introduced vector-valued representation of fundamental
frequency variation–the FFV spectrum–which has a number of desirable properties. In …

被引用次数：75 相关文章所有 22 个版本

[PDF] googleapis.com

Systems and methods for identifying human emotions and/or mental health states based on analyses of audio inputs and/or behavioral data collected from computing …

J Feast, A Azarbayejani, S Place - US Patent 10,276,188, 2019 - Google Patents

Systems and methods are provided for analyzing voice based audio inputs. A voice-based
audio input associated with a user (eg, wherein the voice-based audio input is a prompt or a …

被引用次数：33 相关文章所有 4 个版本

[PDF] isca-archive.org

[PDF][PDF] Modeling phrasing and prominence using deep recurrent learning.

A Rosenberg, R Fernandez, B Ramabhadran - Interspeech, 2015 - isca-archive.org

Abstract Models for the prediction of prosodic events, such as pitch accents and phrasal
boundaries, often rely on machine learning models that combine a set of input features …

被引用次数：35 相关文章所有 6 个版本

[PDF] isca-archive.org

[PDF][PDF] A whispered Mandarin corpus for speech technology applications.

PX Lee, D Wee, HSY Toh, BP Lim, NF Chen… - …, 2014 - isca-archive.org

Whispered speech is a natural mode of speech in which voicing is absent–its acoustics differ
significantly from normally spoken speech or so-called neutral speech, such that it is …

被引用次数：28 相关文章所有 5 个版本

An Effective Hierarchical Graph Attention Network Modeling Approach for Pronunciation Assessment

BC Yan, B Chen - IEEE/ACM Transactions on Audio, Speech …, 2024 - ieeexplore.ieee.org

Automatic pronunciation assessment (APA) manages to quantify second language (L2)
learners' pronunciation proficiency in a target language by providing fine-grained feedback …

[PDF][PDF] Overview of front-end features for robust speaker recognition

Q Jin, TF Zheng - Proc. APSIPA, 2011 - apsipa.org

This paper provides an overview of automatic speaker recognition technologies, with an
emphasis on front-end features for robust speaker recognition. We categorize the frontend …

被引用次数：26 相关文章所有 2 个版本

[PDF] lu.se

Very short utterances in conversation

J Edlund, M Heldner, S Al Moubayed… - Working papers/Lund …, 2010 - journals.lub.lu.se

Faced with the difficulties of finding an operationalized definition of backchannels, we have
previously proposed an intermediate, auxiliary unit–the very short utterance (VSU)–which is …

被引用次数：28 相关文章所有 14 个版本

Voice-transformation-based data augmentation for prosodic classification

R Fernandez, A Rosenberg, A Sorin… - … , Speech and Signal …, 2017 - ieeexplore.ieee.org

In this work we explore data-augmentation techniques for the task of improving the
performance of a supervised recurrent-neural-network classifier tasked with predicting …

被引用次数：13 相关文章所有 3 个版本

高级搜索

QQ 群