High-quality nonparallel voice conversion based on cycle-consistent adversarial network

WN Hsu, D Harwath, C Song, J Glass - arXiv preprint arXiv:2012.15454, 2020 - arxiv.org

In this paper we present the first model for directly synthesizing fluent, natural-sounding
spoken audio captions for images that does not require natural language text as an …

被引用次数：74 相关文章所有 8 个版本

[HTML] frontiersin.org

[HTML][HTML] Audio deepfakes: A survey

Z Khanjani, G Watson, VP Janeja - Frontiers in Big Data, 2023 - frontiersin.org

A deepfake is content or material that is synthetically generated or manipulated using
artificial intelligence (AI) methods, to be passed off as real and can include audio, video …

被引用次数：36 相关文章所有 5 个版本

[PDF] arxiv.org

Non-parallel voice conversion with cyclic variational autoencoder

PL Tobing, YC Wu, T Hayashi, K Kobayashi… - arXiv preprint arXiv …, 2019 - arxiv.org

In this paper, we present a novel technique for a non-parallel voice conversion (VC) with the
use of cyclic variational autoencoder (CycleVAE)-based spectral modeling. In a variational …

被引用次数：88 相关文章所有 6 个版本

[PDF] ieee.org

Transfer learning from speech synthesis to voice conversion with non-parallel training data

M Zhang, Y Zhou, L Zhao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org

We present a novel voice conversion (VC) framework by learning from a text-to-speech
(TTS) synthesis system, that is called TTS-VC transfer learning or TTL-VC for short. We first …

被引用次数：59 相关文章所有 5 个版本

[图书][B] The science of deep learning

I Drori - 2022 - books.google.com

The Science of Deep Learning emerged from courses taught by the author that have
provided thousands of students with training and experience for their academic studies, and …

被引用次数：33 相关文章所有 5 个版本

[PDF] arxiv.org

Introduction to voice presentation attack detection and recent advances

M Sahidullah, H Delgado, M Todisco, A Nautsch… - Handbook of Biometric …, 2023 - Springer

Over the past few years, significant progress has been made in the field of presentation
attack detection (PAD) for automatic speaker recognition (ASV). This includes the …

被引用次数：91 相关文章所有 18 个版本

[PDF] arxiv.org

VQVAE unsupervised unit discovery and multi-scale code2spec inverter for zerospeech challenge 2019

A Tjandra, B Sisman, M Zhang, S Sakti, H Li… - arXiv preprint arXiv …, 2019 - arxiv.org

We describe our submitted system for the ZeroSpeech Challenge 2019. The current
challenge theme addresses the difficulty of constructing a speech synthesizer without any …

被引用次数：87 相关文章所有 9 个版本

[PDF] arxiv.org

Speech representation disentanglement with adversarial mutual information learning for one-shot voice conversion

SC Yang, M Tantrawenith, H Zhuang, Z Wu… - arXiv preprint arXiv …, 2022 - arxiv.org

One-shot voice conversion (VC) with only a single target speaker's speech for reference has
become a hot research topic. Existing works generally disentangle timbre, while information …

被引用次数：32 相关文章所有 6 个版本

[PDF] arxiv.org

Nonparallel emotional speech conversion

J Gao, D Chakraborty, H Tembine… - arXiv preprint arXiv …, 2018 - arxiv.org

We propose a nonparallel data-driven emotional speech conversion method. It enables the
transfer of emotion-related characteristics of a speech signal while preserving the speaker's …

被引用次数：80 相关文章所有 11 个版本

[引用][C] 人脸伪造及检测技术综述

曹申豪，刘晓辉，毛秀青，邹勤 - 2022 - 中国图象图形学报

被引用次数：8 相关文章所有 3 个版本

高级搜索

QQ 群