Z Wang, R Fu, Z Wen, Y Xie, Y Liu, X Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Although current fake audio detection approaches have achieved remarkable success on specific datasets, they often fail when evaluated with datasets from different distributions …
X Wang, R Fu, Z Wen, Z Wang, Y Xie, Y Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
The generalization of Fake Audio Detection (FAD) is critical due to the emergence of new spoofing techniques. Traditional FAD methods often focus solely on distinguishing between …
S Shi, R Fu, Z Wen, J Tao, T Wang, C Qiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description, playing a crucial role in media production. The text descriptions in TTA datasets lack rich …
In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio …
Most of current end-to-end speech synthesis assumes the input text is in a single language situation. However, codeswitching in speech occurs frequently in routine life, in which …
X Wang, Y Lu, X Qi, Z Wang, Y Xie, S Shi… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper presents the development of a speech synthesis system for the LIMMITS'24 Challenge, focusing primarily on Track 2. The objective of the challenge is to establish a …
R Fu, J Tao, Z Wen, J Yi, T Wang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
End-to-end framework can generate high-quality and high-similarity speech in the personalized speech synthesis task. However, the generalization of out-of-domain texts is …