High-quality nonparallel voice conversion based on cycle-consistent adversarial network

S Seshadri, L Juvela, J Yamagishi… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org

Speaking style conversion (SSC) is the technology of converting natural speech signals from
one style to another. In this study, we propose the use of cycle-consistent adversarial …

被引用次数：21 相关文章所有 11 个版本

[PDF] arxiv.org

Convoice: Real-time zero-shot voice style transfer with convolutional network

Y Rebryk, S Beliaev - arXiv preprint arXiv:2005.07815, 2020 - arxiv.org

We propose a neural network for zero-shot voice conversion (VC) without any parallel or
transcribed data. Our approach uses pre-trained models for automatic speech recognition …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques

P Champion - arXiv preprint arXiv:2308.04455, 2023 - arxiv.org

The growing use of voice user interfaces has led to a surge in the collection and storage of
speech data. While data collection allows for the development of efficient tools powering …

被引用次数：2 相关文章所有 9 个版本

[PDF] arxiv.org

Emotional voice conversion with cycle-consistent adversarial network

S Liu, Y Cao, H Meng - arXiv preprint arXiv:2004.03781, 2020 - arxiv.org

Emotional Voice Conversion, or emotional VC, is a technique of converting speech from one
emotion state into another one, keeping the basic linguistic information and speaker identity …

被引用次数：13 相关文章所有 2 个版本

Cyclegan bandwidth extension acoustic modeling for automatic speech recognition

D Haws, X Cui - … 2019-2019 IEEE International Conference on …, 2019 - ieeexplore.ieee.org

Although narrowband (NB) and wideband (WB) speech data primarily differ in sampling rate,
these two common input sources are difficult to simultaneously model for automatic speech …

被引用次数：16 相关文章所有 2 个版本

KE: A Knowledge Enhancing Framework for Machine Learning Models

Y Wang, N Shah, A Soliman, D Guo… - The Journal of …, 2023 - ACS Publications

Machine learning models are widely used in science and engineering to predict the
properties of materials and solve complex problems. However, training large models can …

Intelligibility improvement of dysarthric speech using mmse discogan

M Purohit, M Patel, H Malaviya, A Patil… - 2020 International …, 2020 - ieeexplore.ieee.org

Dysarthria is a manifestation of the disordering in articulatory parts that are used during
speech production, which results in uneven, slow, slurred, monotone speech or speech in …

被引用次数：11 相关文章所有 4 个版本

[PDF] arxiv.org

Time domain adversarial voice conversion for ADD 2022

C Wen, T Guo, X Tan, R Yan, S Zhou… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

In this paper, we describe our speech generation system for the first Audio Deep Synthesis
Detection Challenge (ADD 2022). Firstly, we build an any-to-many voice conversion (VC) …

被引用次数：4 相关文章所有 3 个版本

[PDF] academia.edu

[PDF][PDF] Novel MMSE DiscoGAN for cross-domain whisper-to-speech conversion

NJ Shah, M Parmar, N Shah, HA Patil - Machine Learning in Speech …, 2018 - academia.edu

Novel MMSE DiscoGAN for Cross-Domain Whisper-to-Speech Conversion Page 1 Novel
MMSE DiscoGAN for Cross-Domain Whisper-to-Speech Conversion Nirmesh J. Shah, Mihir …

被引用次数：17 相关文章所有 3 个版本

[PDF] academia.edu

A compact framework for voice conversion using wavenet conditioned on phonetic posteriorgrams

H Lu, Z Wu, R Li, S Kang, J Jia… - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

Voice conversion can benefit from WaveNet vocoder with improvement in converted
speech's naturalness and quality. However, nowadays approaches segregate the training of …

被引用次数：14 相关文章所有 6 个版本

高级搜索

QQ 群