Many-to-many cross-lingual voice conversion with a jointly trained speaker embedding network

Y Zhou, X Tian, H Li - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org

Cross-lingual personalized speech generation seeks to synthesize a target speaker's voice
from only a few training samples that are in a different language. One popular technique is to …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

Optimizing voice conversion network with cycle consistency loss of speaker identity

H Du, X Tian, L Xie, H Li - 2021 IEEE Spoken language …, 2021 - ieeexplore.ieee.org

We propose a novel training scheme to optimize voice conversion network with a speaker
identity loss function. The training scheme not only minimizes frame-level spectral loss, but …

被引用次数：21 相关文章所有 4 个版本

[PDF] ieee.org

Self-supervised training of speaker encoder with multi-modal diverse positive pairs

R Tao, KA Lee, RK Das… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org

We study a novel neural speaker encoder and its training strategies for speaker recognition
without using any identity labels. The speaker encoder is trained to extract a fixed …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

A modularized neural network with language-specific output layers for cross-lingual voice conversion

Y Zhou, X Tian, E Yılmaz, RK Das… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org

This paper presents a cross-lingual voice conversion framework that adopts a modularized
neural network. The modularized neural network has a common input structure that is …

被引用次数：14 相关文章所有 4 个版本

[PDF] isca-archive.org

[PDF][PDF] Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation.

Y Zhou, X Tian, Z Wu, H Li - Interspeech, 2021 - isca-archive.org

Abstract Cross-Lingual Voice Conversion (XVC) aims to modify a source speaker identity
towards a target while preserving the source linguistic content. This paper introduces a cycle …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

Improving robustness of one-shot voice conversion with deep discriminative speaker encoder

H Du, L Xie - arXiv preprint arXiv:2106.10406, 2021 - arxiv.org

One-shot voice conversion has received significant attention since only one utterance from
source speaker and target speaker respectively is required. Moreover, source speaker and …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

R Tao, KA Lee, RK Das, V Hautamäki, H Li - arXiv preprint arXiv …, 2022 - arxiv.org

We study a novel neural architecture and its training strategies of speaker encoder for
speaker recognition without using any identity labels. The speaker encoder is trained to …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Transfer learning from monolingual asr to transcription-free cross-lingual voice conversion

CJ Chang - arXiv preprint arXiv:2009.14668, 2020 - arxiv.org

Cross-lingual voice conversion (VC) is a task that aims to synthesize target voices with the
same content while source and target speakers speak in different languages. Its challenge …

被引用次数：5 相关文章所有 2 个版本

MaskMel-Prosody-CycleGAN-VC: High-Quality Cross-Lingual Voice Conversion

S Yan, S Chen, Y Xu, D Ke - International Conference on Artificial …, 2023 - Springer

Voice conversion aims to change the timber of the source speaker to that of the target
speaker without changing the speech content. The cross-lingual voice conversion requires …

Audio-Visual Active Speaker Detection and Recognition

T Ruijie - 2023 - search.proquest.com

In our daily life, humans can recognize the person based on their facial and voice
characteristics. Research in biology has proved that speech and face modalities can provide …

高级搜索

QQ 群