Focusing on attention: prosody transfer and adaptative optimization strategy for multi-speaker...

A Coy, PS Mohammed, P Skerrit - International Journal of Artificial …, 2024 - Springer

Deaf learners in the Global South struggle to access equitable education, in particular, there
are few instances where they can be facilitated in inclusive classrooms. The challenges …

被引用次数：5 相关文章

[PDF] arxiv.org

Generalized Fake Audio Detection via Deep Stable Learning

Z Wang, R Fu, Z Wen, Y Xie, Y Liu, X Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Although current fake audio detection approaches have achieved remarkable success on
specific datasets, they often fail when evaluated with datasets from different distributions …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection

X Wang, R Fu, Z Wen, Z Wang, Y Xie, Y Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

The generalization of Fake Audio Detection (FAD) is critical due to the emergence of new
spoofing techniques. Traditional FAD methods often focus solely on distinguishing between …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

S Shi, R Fu, Z Wen, J Tao, T Wang, C Qiang… - arXiv preprint arXiv …, 2024 - arxiv.org

Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description,
playing a crucial role in media production. The text descriptions in TTA datasets lack rich …

深度学习语音合成技术综述.

张小峰，谢钧，罗健欣，杨涛 - Journal of Computer …, 2021 - search.ebscohost.com

语音合成技术在人机交互中扮演着重要角色, 深度学习的发展带动语音合成技术高速发展.
基于深度学习的语音合成技术在合成语音的质量和速度上都超过了传统语音合成技术 …

被引用次数：1 相关文章

[PDF] arxiv.org

A Noval Feature via Color Quantisation for Fake Audio Detection

Z Wang, X Wang, Y Xie, R Fu, Z Wen… - 2024 IEEE 14th …, 2024 - ieeexplore.ieee.org

In the field of deepfake detection, previous studies focus on using reconstruction or mask
and prediction methods to train pre-trained models, which are then transferred to fake audio …

[PDF][PDF] Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.

R Fu, J Tao, Z Wen, J Yi, C Qiang, T Wang - INTERSPEECH, 2020 - isca-archive.org

Most of current end-to-end speech synthesis assumes the input text is in a single language
situation. However, codeswitching in speech occurs frequently in routine life, in which …

被引用次数：7 相关文章所有 6 个版本

[PDF] arxiv.org

A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge

X Wang, Y Lu, X Qi, Z Wang, Y Xie, S Shi… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper presents the development of a speech synthesis system for the LIMMITS'24
Challenge, focusing primarily on Track 2. The objective of the challenge is to establish a …

Bi-level style and prosody decoupling modeling for personalized end-to-end speech synthesis

R Fu, J Tao, Z Wen, J Yi, T Wang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

End-to-end framework can generate high-quality and high-similarity speech in the
personalized speech synthesis task. However, the generalization of out-of-domain texts is …

被引用次数：2 相关文章所有 2 个版本

[PDF] 159.226.43.17

[PDF][PDF] 虚拟场景中环境声源仿真技术综述

程皓楠，张加万 - 计算机学报, 2022 - 159.226.43.17

摘要环境声音作为日常生活中分布最为广泛的一类声音, 是人们获取外部信息的重要来源.
近十几年来, 随着用户对虚拟场景真实度要求不断提升, 为虚拟场景打造同步 …

高级搜索

QQ 群