Multi-task waveRNN with an integrated architecture for cross-lingual voice conversion

Z Pan, R Tao, C Xu, H Li - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org

A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-
talker speech mixture when given a cue that represents the target speaker, such as a pre …

被引用次数：48 相关文章所有 4 个版本

[PDF] arxiv.org

Reinforcement learning for emotional text-to-speech synthesis with improved emotion discriminability

R Liu, B Sisman, H Li - arXiv preprint arXiv:2104.01408, 2021 - arxiv.org

Emotional text-to-speech synthesis (ETTS) has seen much progress in recent years.
However, the generated voice is often not perceptually identifiable by its intended emotion …

被引用次数：47 相关文章所有 8 个版本

[PDF] ieee.org

Language agnostic speaker embedding for cross-lingual personalized speech generation

Y Zhou, X Tian, H Li - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org

Cross-lingual personalized speech generation seeks to synthesize a target speaker's voice
from only a few training samples that are in a different language. One popular technique is to …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

Speech separation with pretrained frontend to minimize domain mismatch

W Wang, Z Pan, X Li, S Wang… - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org

Speech separation seeks to separate individual speech signals from a speech mixture.
Typically, most separation models are trained on synthetic data due to the unavailability of …

被引用次数：2 相关文章所有 5 个版本

[PDF] ieee.org

Optimization of cross-lingual voice conversion with linguistics losses to reduce foreign accents

Y Zhou, Z Wu, X Tian, H Li - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org

Cross-lingual voice conversion (XVC) transforms the speaker identity of a source speaker to
that of a target speaker who speaks a different language. Due to the intrinsic differences …

被引用次数：3 相关文章所有 2 个版本

[PDF] github.io

Spike-event-driven deep spiking neural network with temporal encoding

Z Zhang, Q Liu - IEEE Signal Processing Letters, 2021 - ieeexplore.ieee.org

Feature extractionplays an important role before pattern recognition takes place. The
existing artificial neural networks (ANNs), however, ignoreto learn and represent temporal …

被引用次数：11 相关文章所有 4 个版本

Audio-visual speech synthesis using vision transformer–enhanced autoencoders with ensemble of loss functions

S Ghosh, S Sarkar, S Ghosh, F Zalkow, ND Jana - Applied Intelligence, 2024 - Springer

Audio-visual speech synthesis (AVSS) has garnered attention in recent years for its utility in
the realm of audio-visual learning. AVSS transforms one speaker's speech into another's …

被引用次数：3 相关文章所有 3 个版本

[PDF] isca-archive.org

[PDF][PDF] Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation.

Y Zhou, X Tian, Z Wu, H Li - Interspeech, 2021 - isca-archive.org

Abstract Cross-Lingual Voice Conversion (XVC) aims to modify a source speaker identity
towards a target while preserving the source linguistic content. This paper introduces a cycle …

被引用次数：8 相关文章所有 5 个版本

[PDF] isca-archive.org

[PDF][PDF] A multi-task and transfer learning based approach for MOS prediction

X Tian, K Fu, S Gao, Y Gu, K Wang, W Li, Z Ma… - 2022 - isca-archive.org

Automatic speech quality assessment aims to train a model capable of automatically
measuring the performance of synthesis systems. This is a challenging task, especially …

被引用次数：1 相关文章所有 4 个版本

MaskMel-Prosody-CycleGAN-VC: High-Quality Cross-Lingual Voice Conversion

S Yan, S Chen, Y Xu, D Ke - International Conference on Artificial …, 2023 - Springer

Voice conversion aims to change the timber of the source speaker to that of the target
speaker without changing the speech content. The cross-lingual voice conversion requires …

高级搜索

QQ 群