Text-free image-to-speech synthesis using learned segmental units

WN Hsu, D Harwath, C Song, J Glass - arXiv preprint arXiv:2012.15454, 2020 - arxiv.org
In this paper we present the first model for directly synthesizing fluent, natural-sounding
spoken audio captions for images that does not require natural language text as an …

[HTML][HTML] Audio deepfakes: A survey

Z Khanjani, G Watson, VP Janeja - Frontiers in Big Data, 2023 - frontiersin.org
A deepfake is content or material that is synthetically generated or manipulated using
artificial intelligence (AI) methods, to be passed off as real and can include audio, video …

Non-parallel voice conversion with cyclic variational autoencoder

PL Tobing, YC Wu, T Hayashi, K Kobayashi… - arXiv preprint arXiv …, 2019 - arxiv.org
In this paper, we present a novel technique for a non-parallel voice conversion (VC) with the
use of cyclic variational autoencoder (CycleVAE)-based spectral modeling. In a variational …

Transfer learning from speech synthesis to voice conversion with non-parallel training data

M Zhang, Y Zhou, L Zhao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
We present a novel voice conversion (VC) framework by learning from a text-to-speech
(TTS) synthesis system, that is called TTS-VC transfer learning or TTL-VC for short. We first …

[图书][B] The science of deep learning

I Drori - 2022 - books.google.com
The Science of Deep Learning emerged from courses taught by the author that have
provided thousands of students with training and experience for their academic studies, and …

Introduction to voice presentation attack detection and recent advances

M Sahidullah, H Delgado, M Todisco, A Nautsch… - Handbook of Biometric …, 2023 - Springer
Over the past few years, significant progress has been made in the field of presentation
attack detection (PAD) for automatic speaker recognition (ASV). This includes the …

VQVAE unsupervised unit discovery and multi-scale code2spec inverter for zerospeech challenge 2019

A Tjandra, B Sisman, M Zhang, S Sakti, H Li… - arXiv preprint arXiv …, 2019 - arxiv.org
We describe our submitted system for the ZeroSpeech Challenge 2019. The current
challenge theme addresses the difficulty of constructing a speech synthesizer without any …

Speech representation disentanglement with adversarial mutual information learning for one-shot voice conversion

SC Yang, M Tantrawenith, H Zhuang, Z Wu… - arXiv preprint arXiv …, 2022 - arxiv.org
One-shot voice conversion (VC) with only a single target speaker's speech for reference has
become a hot research topic. Existing works generally disentangle timbre, while information …

Nonparallel emotional speech conversion

J Gao, D Chakraborty, H Tembine… - arXiv preprint arXiv …, 2018 - arxiv.org
We propose a nonparallel data-driven emotional speech conversion method. It enables the
transfer of emotion-related characteristics of a speech signal while preserving the speaker's …

[引用][C] 人脸伪造及检测技术综述

曹申豪, 刘晓辉, 毛秀青, 邹勤 - 2022 - 中国图象图形学报