Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion

S Seshadri, L Juvela, J Yamagishi… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
Speaking style conversion (SSC) is the technology of converting natural speech signals from
one style to another. In this study, we propose the use of cycle-consistent adversarial …

Convoice: Real-time zero-shot voice style transfer with convolutional network

Y Rebryk, S Beliaev - arXiv preprint arXiv:2005.07815, 2020 - arxiv.org
We propose a neural network for zero-shot voice conversion (VC) without any parallel or
transcribed data. Our approach uses pre-trained models for automatic speech recognition …

Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques

P Champion - arXiv preprint arXiv:2308.04455, 2023 - arxiv.org
The growing use of voice user interfaces has led to a surge in the collection and storage of
speech data. While data collection allows for the development of efficient tools powering …

Emotional voice conversion with cycle-consistent adversarial network

S Liu, Y Cao, H Meng - arXiv preprint arXiv:2004.03781, 2020 - arxiv.org
Emotional Voice Conversion, or emotional VC, is a technique of converting speech from one
emotion state into another one, keeping the basic linguistic information and speaker identity …

Cyclegan bandwidth extension acoustic modeling for automatic speech recognition

D Haws, X Cui - … 2019-2019 IEEE International Conference on …, 2019 - ieeexplore.ieee.org
Although narrowband (NB) and wideband (WB) speech data primarily differ in sampling rate,
these two common input sources are difficult to simultaneously model for automatic speech …

KE: A Knowledge Enhancing Framework for Machine Learning Models

Y Wang, N Shah, A Soliman, D Guo… - The Journal of …, 2023 - ACS Publications
Machine learning models are widely used in science and engineering to predict the
properties of materials and solve complex problems. However, training large models can …

Intelligibility improvement of dysarthric speech using mmse discogan

M Purohit, M Patel, H Malaviya, A Patil… - 2020 International …, 2020 - ieeexplore.ieee.org
Dysarthria is a manifestation of the disordering in articulatory parts that are used during
speech production, which results in uneven, slow, slurred, monotone speech or speech in …

Time domain adversarial voice conversion for ADD 2022

C Wen, T Guo, X Tan, R Yan, S Zhou… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
In this paper, we describe our speech generation system for the first Audio Deep Synthesis
Detection Challenge (ADD 2022). Firstly, we build an any-to-many voice conversion (VC) …

[PDF][PDF] Novel MMSE DiscoGAN for cross-domain whisper-to-speech conversion

NJ Shah, M Parmar, N Shah, HA Patil - Machine Learning in Speech …, 2018 - academia.edu
Novel MMSE DiscoGAN for Cross-Domain Whisper-to-Speech Conversion Page 1 Novel
MMSE DiscoGAN for Cross-Domain Whisper-to-Speech Conversion Nirmesh J. Shah, Mihir …

A compact framework for voice conversion using wavenet conditioned on phonetic posteriorgrams

H Lu, Z Wu, R Li, S Kang, J Jia… - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
Voice conversion can benefit from WaveNet vocoder with improvement in converted
speech's naturalness and quality. However, nowadays approaches segregate the training of …