A review of differentiable digital signal processing for music and speech synthesis

B Hayes, J Shier, G Fazekas, A McPherson… - Frontiers in Signal …, 2024 - frontiersin.org
The term “differentiable digital signal processing” describes a family of techniques in which
loss function gradients are backpropagated through digital signal processors, facilitating …

Speech synthesis from intracranial stereotactic Electroencephalography using a neural vocoder

FV Arthur, TG Csapó - INFOCOMMUNICATIONS JOURNAL: A …, 2024 - real.mtak.hu
Speech is one of the most important human biosignals. However, only some speech
production characteristics are fully understood, which are required for a successful speech …

A Smart Control System for the Oil Industry Using Text-to-Speech Synthesis Based on IIoT

AR Mandeel, AA Aggar, MS Al-Radhi, TG Csapó - Electronics, 2023 - mdpi.com
Oil refineries have high operating expenses and are often exposed to increased asset
integrity risks and functional failure. Real-time monitoring of their operations has always …

Signal Reconstruction from Mel-Spectrogram Based on Bi-Level Consistency of Full-Band Magnitude and Phase

Y Masuyama, N Ueno, N Ono - 2023 IEEE Workshop on …, 2023 - ieeexplore.ieee.org
We propose an optimization-based method for reconstructing a time-domain signal from a
low-dimensional spectral representation such as a mel-spectrogram. Phase reconstruction …

BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation

HP Du, YX Lu, Y Ai, ZH Ling - arXiv preprint arXiv:2406.02162, 2024 - arxiv.org
This paper proposes a novel bidirectional neural vocoder, named BiVocoder, capable both
of feature extraction and reverse waveform generation within the short-time Fourier …

Puffin: Pitch-synchronous neural waveform generation for fullband speech on modest devices

O Watts, L Wihlborg… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We present a neural vocoder designed with low-powered Alternative and Augmentative
Communication devices in mind. By combining elements of successful modern vocoders …

ChildTinyTalks (CTT): A Benchmark Dataset and Baseline for Expressive Child Speech Synthesis

S Alwaisi, MS Al-Radhi, G Németh - International Conference on Speech …, 2024 - Springer
Designing expressive speech synthesis for child voice remains an unresolved problem. One
of the major dilemmas faced by child TTS systems and child speech synthesis is the scarcity …

Automated Child Voice Generation: Methodology and Implementation

S Alwaisi, MS Al-Radhi… - … Conference on Speech …, 2023 - ieeexplore.ieee.org
Significant progress has been made in the development of text-to-speech (TTS) models;
however, synthesizing child speech remains a challenging task. Limited research has been …

FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs

W Jang, D Lim, H Park - arXiv preprint arXiv:2305.10823, 2023 - arxiv.org
This paper presents FastFit, a novel neural vocoder architecture that replaces the U-Net
encoder with multiple short-time Fourier transforms (STFTs) to achieve faster generation …

[PDF][PDF] БЫСТРЫЙ СИНТЕЗ АУДИОСИГНАЛОВ ПО ИЗОБРАЖЕНИЯМ СПЕКТРОГРАММ В ЗАДАЧАХ ЗАЩИТЫ РЕЧЕВОЙ ИНФОРМАЦИИ

СВ Дворянкин, НС Дворянкин… - Вопросы …, 2024 - cyberrus.info
Научная новизна: предложен новый метод инверсии спектрограмм на основе
рассечения-разнесения образа исходной спектрограммы для получения более точных …