Voice filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

A Gabryś, G Huybrechts, MS Ribeiro… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
State-of-the-art text-to-speech (TTS) systems require several hours of recorded speech data
to generate high-quality synthetic speech. When using reduced amounts of training data …

[PDF][PDF] A review on state-of-the-art Automatic Speaker verification system from spoofing and anti-spoofing perspective

A Chadha, A Abdullah… - Indian Journal …, 2021 - sciresol.s3.us-east-2.amazonaws …
Abstract Background/Objectives: The anti-spoofing measures are blooming with an aim to
protect the Automatic Speaker Verification systems from susceptible spoofing attacks. This …

A High-Quality Melody-Aware Peking Opera Synthesizer Using Data Augmentation

X Zhou, W Sun, X Shi - 2023 IEEE International Conference on …, 2023 - ieeexplore.ieee.org
The performing art of Peking Opera places great demands on the singing skills of singers,
including pronunciation, melody, role, personal style and emotional expression, which …

Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation

KK Lakshminarayana, C Dittmar, N Pia… - 2023 31st European …, 2023 - ieeexplore.ieee.org
Many neural text-to-speech architectures can synthesize nearly natural speech from text
inputs. These architectures must be trained with tens of hours of annotated and high-quality …

[PDF][PDF] Contributions to neural speech synthesis using limited data enhanced with lexical features

B Lorincz - Proc. 2021 ISCA Symposium on Security and Privacy …, 2021 - isca-archive.org
Building single or multi-speaker neural network-based text-tospeech synthesis systems
commonly relies on the availability of large amounts of high quality recordings from each …

Taiwanese-Accented Mandarin and English Multi-Speaker Talking-Face Synthesis System

CH Lin, JP Liao, CC Hsieh, KC Liao… - Proceedings of the 34th …, 2022 - aclanthology.org
This paper proposes a multi-speaker talking-face synthesis system. The system incorporates
voice cloning and lip-syncing technology to achieve text-to-talking-face generation by …