Inclusive Deaf Education Enabled by Artificial Intelligence: The Path to a Solution

A Coy, PS Mohammed, P Skerrit - International Journal of Artificial …, 2024 - Springer
Deaf learners in the Global South struggle to access equitable education, in particular, there
are few instances where they can be facilitated in inclusive classrooms. The challenges …

Generalized Fake Audio Detection via Deep Stable Learning

Z Wang, R Fu, Z Wen, Y Xie, Y Liu, X Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Although current fake audio detection approaches have achieved remarkable success on
specific datasets, they often fail when evaluated with datasets from different distributions …

Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection

X Wang, R Fu, Z Wen, Z Wang, Y Xie, Y Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
The generalization of Fake Audio Detection (FAD) is critical due to the emergence of new
spoofing techniques. Traditional FAD methods often focus solely on distinguishing between …

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

S Shi, R Fu, Z Wen, J Tao, T Wang, C Qiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description,
playing a crucial role in media production. The text descriptions in TTA datasets lack rich …

深度学习语音合成技术综述.

张小峰, 谢钧, 罗健欣, 杨涛 - Journal of Computer …, 2021 - search.ebscohost.com
语音合成技术在人机交互中扮演着重要角色, 深度学习的发展带动语音合成技术高速发展.
基于深度学习的语音合成技术在合成语音的质量和速度上都超过了传统语音合成技术 …

A Noval Feature via Color Quantisation for Fake Audio Detection

Z Wang, X Wang, Y Xie, R Fu, Z Wen… - 2024 IEEE 14th …, 2024 - ieeexplore.ieee.org
In the field of deepfake detection, previous studies focus on using reconstruction or mask
and prediction methods to train pre-trained models, which are then transferred to fake audio …

[PDF][PDF] Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.

R Fu, J Tao, Z Wen, J Yi, C Qiang, T Wang - INTERSPEECH, 2020 - isca-archive.org
Most of current end-to-end speech synthesis assumes the input text is in a single language
situation. However, codeswitching in speech occurs frequently in routine life, in which …

A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge

X Wang, Y Lu, X Qi, Z Wang, Y Xie, S Shi… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper presents the development of a speech synthesis system for the LIMMITS'24
Challenge, focusing primarily on Track 2. The objective of the challenge is to establish a …

Bi-level style and prosody decoupling modeling for personalized end-to-end speech synthesis

R Fu, J Tao, Z Wen, J Yi, T Wang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
End-to-end framework can generate high-quality and high-similarity speech in the
personalized speech synthesis task. However, the generalization of out-of-domain texts is …

[PDF][PDF] 虚拟场景中环境声源仿真技术综述

程皓楠, 张加万 - 计算机学报, 2022 - 159.226.43.17
摘要环境声音作为日常生活中分布最为广泛的一类声音, 是人们获取外部信息的重要来源.
近十几年来, 随着用户对虚拟场景真实度要求不断提升, 为虚拟场景打造同步 …