F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer

Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

被引用次数：253 相关文章所有 10 个版本

[PDF] mlr.press

Contentvec: An improved self-supervised speech representation by disentangling speakers

K Qian, Y Zhang, H Gao, J Ni, CI Lai… - International …, 2022 - proceedings.mlr.press

Self-supervised learning in speech involves training a speech representation network on a
large-scale unannotated speech corpus, and then applying the learned representations to …

被引用次数：70 相关文章所有 9 个版本

[PDF] mlr.press

Unsupervised speech decomposition via triple information bottleneck

K Qian, Y Zhang, S Chang… - International …, 2020 - proceedings.mlr.press

Speech information can be roughly decomposed into four components: language content,
timbre, pitch, and rhythm. Obtaining disentangled representations of these components is …

被引用次数：178 相关文章所有 10 个版本

[PDF] arxiv.org

Vqmivc: Vector quantization and mutual information-based unsupervised speech representation disentanglement for one-shot voice conversion

D Wang, L Deng, YT Yeung, X Chen, X Liu… - arXiv preprint arXiv …, 2021 - arxiv.org

One-shot voice conversion (VC), which performs conversion across arbitrary speakers with
only a single target-speaker utterance for reference, can be effectively achieved by speech …

被引用次数：113 相关文章所有 8 个版本

[PDF] arxiv.org

Starganv2-vc: A diverse, unsupervised, non-parallel framework for natural-sounding voice conversion

YA Li, A Zare, N Mesgarani - arXiv preprint arXiv:2107.10394, 2021 - arxiv.org

We present an unsupervised non-parallel many-to-many voice conversion (VC) method
using a generative adversarial network (GAN) called StarGAN v2. Using a combination of …

被引用次数：76 相关文章所有 5 个版本

[PDF] arxiv.org

Vqvc+: One-shot voice conversion by vector quantization and u-net architecture

DY Wu, YH Chen, HY Lee - arXiv preprint arXiv:2006.04154, 2020 - arxiv.org

Voice conversion (VC) is a task that transforms the source speaker's timbre, accent, and
tones in audio into another one's while preserving the linguistic content. It is still a …

被引用次数：103 相关文章所有 7 个版本

[PDF] arxiv.org

Again-vc: A one-shot voice conversion using activation guidance and adaptive instance normalization

YH Chen, DY Wu, TH Wu, H Lee - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

Recently, voice conversion (VC) has been widely studied. Many VC systems use
disentangle-based learning techniques to separate the speaker and the linguistic content …

被引用次数：95 相关文章所有 4 个版本

[PDF] arxiv.org

Diffusion-based voice conversion with fast maximum likelihood sampling scheme

V Popov, I Vovk, V Gogoryan, T Sadekova… - arXiv preprint arXiv …, 2021 - arxiv.org

Voice conversion is a common speech synthesis task which can be solved in different ways
depending on a particular real-world scenario. The most challenging one often referred to as …

被引用次数：73 相关文章所有 4 个版本

[PDF] ieee.org

Emotion intensity and its control for emotional voice conversion

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while
preserving the linguistic content and speaker identity. In EVC, emotions are usually treated …

被引用次数：43 相关文章所有 7 个版本

[PDF] mlr.press

Global prosody style transfer without text transcriptions

K Qian, Y Zhang, S Chang, J Xiong… - International …, 2021 - proceedings.mlr.press

Prosody plays an important role in characterizing the style of a speaker or an emotion, but
most non-parallel voice or emotion style transfer algorithms do not convert any prosody …

被引用次数：36 相关文章所有 8 个版本

高级搜索

QQ 群