End-to-end voice spoofing detection employing time delay neural networks and higher order statistics

J Alam, A Fathan, WH Kang - … 2021, St. Petersburg, Russia, September 27 …, 2021 - Springer
Technological progress and proliferation of sophisticated software has made it easier than
ever to spoof a person's voice and audio in general. Like other biometrics, speaker …

A particular character speech synthesis system based on deep learning

Y Mei, D Ye, S Jiang, J Liu - IETE Technical Review, 2021 - Taylor & Francis
The speech synthesis system of a particular character is a TTS (text-to-speech) synthetic
system, which can obtain voice with the specific speaker's voice characteristics. The …

Voice Reenactment with F0 and timing constraints and adversarial learning of conversions

F Bous, L Benaroya, N Obin… - 2022 30th European …, 2022 - ieeexplore.ieee.org
This paper introduces voice reenactement as the task of voice conversion (VC) in which the
expressivity of the source speaker is preserved during conversion while the identity of a …

Smartphone speech privacy concerns from side-channel attacks on facial biomechanics

I Griswold-Steiner, Z LeFevre, A Serwadda - Computers & Security, 2021 - Elsevier
Speech is a complex orchestration of physical movements which involves the lungs, vocal
cords, face, jaw, and mouth. As we speak on the phone, we inadvertently impart energy on …

Generation and detection of media clones

I Echizen, N Babaguchi, J Yamagishi… - … on Information and …, 2021 - search.ieice.org
With the spread of high-performance sensors and social network services (SNS) and the
remarkable advances in machine learning technologies, fake media such as fake videos …

Voice spoofing detection using a neural networks assembly considering spectrograms and mel frequency cepstral coefficients

CA Hernández-Nava, EA Rincón-García… - PeerJ Computer …, 2023 - peerj.com
Nowadays, biometric authentication has gained relevance due to the technological
advances that have allowed its inclusion in many daily-use devices. However, this same …

Spectrum and prosody conversion for cross-lingual voice conversion with cyclegan

Z Du, K Zhou, B Sisman, H Li - 2020 Asia-Pacific Signal and …, 2020 - ieeexplore.ieee.org
Cross-lingual voice conversion aims to change source speaker's voice to sound like that of
target speaker, when source and target speakers speak different languages. It relies on …

Every Breath You Don't Take: Deepfake Speech Detection Using Breath

S Layton, T De Andrade, D Olszewski, K Warren… - arXiv preprint arXiv …, 2024 - arxiv.org
Deepfake speech represents a real and growing threat to systems and society. Many
detectors have been created to aid in defense against speech deepfakes. While these …

CCLCap-AE-AVSS: Cycle consistency loss based capsule autoencoders for audio–visual speech synthesis

S Ghosh, ND Jana, T Si, S Mallik… - Journal of Intelligent …, 2024 - degruyter.com
Audio–visual speech synthesis (AVSS) is a rapidly growing field in the paradigm of audio–
visual learning, involving the conversion of one person's speech into the audio–visual …

Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems

F Fang, J Yamagishi, I Echizen… - … and Security (WIFS), 2018 - ieeexplore.ieee.org
Automatic speaker verification (ASV) systems use a playback detector to filter out playback
attacks and ensure verification reliability. Since current playback detection models are …