Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language

Y Yasuda, X Wang, S Takaki… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
End-to-end speech synthesis is a promising approach that directly converts raw text to
speech. Although it was shown that Tacotron2 outperforms classical pipeline systems with …

Long range acoustic and deep features perspective on ASVspoof 2019

RK Das, J Yang, H Li - 2019 IEEE Automatic Speech …, 2019 - ieeexplore.ieee.org
To secure automatic speaker verification (ASV) systems from intruders, robust
countermeasures for spoofing attack detection are required. The ASVspoof series of …

[PDF][PDF] Known-unknown data augmentation strategies for detection of logical access, physical access and speech deepfake attacks: ASVspoof 2021

RK Das - Proc. 2021 Edition of the Automatic Speaker …, 2021 - isca-archive.org
The rise in demand of voice biometric systems also increases the threat from various kinds
of spoofing attacks from unauthorized users. The latest ASVspoof 2021 challenge devotes to …

Optimizing GIS partial discharge pattern recognition in the ubiquitous power internet of things context: A MixNet deep learning model

Y Wang, J Yan, Z Yang, Y Zhao, T Liu - International Journal of Electrical …, 2021 - Elsevier
Gas-insulated switchgears (GISs) are an essential component of the power system, but in
the event of a failure they may pose a serious threat to the safe operation of the entire power …

Introduction to voice presentation attack detection and recent advances

M Sahidullah, H Delgado, M Todisco, A Nautsch… - Handbook of Biometric …, 2023 - Springer
Over the past few years, significant progress has been made in the field of presentation
attack detection (PAD) for automatic speaker recognition (ASV). This includes the …

[PDF][PDF] Detecting AI-Synthesized Speech Using Bispectral Analysis.

EA AlBadawy, S Lyu, H Farid - CVPR workshops, 2019 - openaccess.thecvf.com
From speech to images, and videos, advances in machine learning have led to dramatic
improvements in the quality and realism of so-called AI-synthesized content. While there are …

Evaluating voice conversion-based privacy protection against informed attackers

BML Srivastava, N Vauquier… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Speech data conveys sensitive speaker attributes like identity or accent. With a small
amount of found data, such attributes can be inferred and exploited for malicious purposes …

Cross-lingual voice conversion with bilingual phonetic posteriorgram and average modeling

Y Zhou, X Tian, H Xu, RK Das… - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
This paper presents a cross-lingual voice conversion approach using bilingual Phonetic
PosteriorGram (PPG) and average modeling. The proposed approach makes use of …

Nautilus: a versatile voice cloning system

HT Luong, J Yamagishi - IEEE/ACM Transactions on Audio …, 2020 - ieeexplore.ieee.org
We introduce a novel speech synthesis system, called NAUTILUS, that can generate speech
with a target voice either from a text input or a reference utterance of an arbitrary source …

[PDF][PDF] Spontaneous Conversational Speech Synthesis from Found Data.

É Székely, GE Henter, J Beskow, J Gustafson - Interspeech, 2019 - isca-archive.org
Synthesising spontaneous speech is a difficult task due to disfluencies, high variability and
syntactic conventions different from those of written language. Using found data, as opposed …