An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai

C Zhang, C Zhang, S Zheng, M Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative AI has demonstrated impressive performance in various fields, among which
speech synthesis is an interesting direction. With the diffusion model as the most popular …

Big data analytics and mining for effective visualization and trends forecasting of crime data

M Feng, J Zheng, J Ren, A Hussain, X Li, Y Xi… - IEEE Access, 2019 - ieeexplore.ieee.org
Big data analytics (BDA) is a systematic approach for analyzing and identifying different
patterns, relations, and trends within a large volume of data. In this paper, we apply BDA to …

Sequence-to-sequence acoustic modeling for voice conversion

JX Zhang, ZH Ling, LJ Liu, Y Jiang… - IEEE/ACM Transactions …, 2019 - ieeexplore.ieee.org
In this paper, a neural network named sequence-to-sequence ConvErsion NeTwork
(SCENT) is presented for acoustic modeling in voice conversion. At training stage, a SCENT …

Predicting vacant parking space availability: A DWT-Bi-LSTM model

C Zeng, C Ma, K Wang, Z Cui - Physica A: Statistical Mechanics and its …, 2022 - Elsevier
Accurate and efficient prediction of vacant parking space availability, despite its great
significance, is no easy a task. How to address the noise in the original data and how to …

Data-driven short-term forecasting for urban road network traffic based on data processing and LSTM-RNN

W Xiangxue, X Lunhui, C Kaixun - Arabian Journal for Science and …, 2019 - Springer
A short-term traffic flow prediction framework is proposed for urban road networks based on
data-driven methods that mainly include two modules. The first module contains a set of …

Snn and sound: a comprehensive review of spiking neural networks in sound

S Baek, J Lee - Biomedical Engineering Letters, 2024 - Springer
The rapid advancement of AI and machine learning has significantly enhanced sound and
acoustic recognition technologies, moving beyond traditional models to more sophisticated …

[PDF][PDF] Enhancing Intelligibility of Dysarthric Speech Using Gated Convolutional-Based Voice Conversion System.

CY Chen, WZ Zheng, SS Wang, Y Tsao, PC Li… - Interspeech, 2020 - isca-archive.org
The voice conversion (VC) system is a well-known approach to improve the communication
efficiency of patients with dysarthria. In this study, we used a gated convolutional neural …

Improving sequence-to-sequence voice conversion by adding text-supervision

JX Zhang, ZH Ling, Y Jiang, LJ Liu… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
This paper presents methods of making using of text supervision to improve the performance
of sequence-to-sequence (seq2seq) voice conversion. Compared with conventional frame …

Mel-spectrogram augmentation for sequence to sequence voice conversion

Y Hwang, H Cho, H Yang, DO Won, I Oh… - arXiv preprint arXiv …, 2020 - arxiv.org
For training the sequence-to-sequence voice conversion model, we need to handle an issue
of insufficient data about the number of speech pairs which consist of the same utterance …