FIRNet: Fundamental Frequency Controllable Fast Neural Vocoder With Trainable Finite Impulse Response Filter

Y Ohtani, T Okamoto, T Toda… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Some neural vocoders with fundamental frequency (f 0) control have succeeded in
performing real-time inference on a single CPU while preserving the quality of the synthetic …

Sifisinger: A High-Fidelity End-to-End Singing Voice Synthesizer Based on Source-Filter Model

J Cui, Y Gu, C Weng, J Zhang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
This paper presents an advanced end-to-end singing voice synthesis (SVS) system based
on the source-filter mechanism that directly translates lyrical and melodic cues into …

Performance of Text-Independent Automatic Speaker Recognition on a Multicore System

R Kouatly, TA Khan - Tsinghua Science and Technology, 2023 - ieeexplore.ieee.org
This paper studies a high-speed text-independent Automatic Speaker Recognition (ASR)
algorithm based on a multicore system's Gaussian Mixture Model (GMM). The high speech …

HiFi-Glot: Neural Formant Synthesis with Differentiable Resonant Filters

L Juvela, PP Zarazaga, GE Henter, Z Malisz - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce an end-to-end neural speech synthesis system that uses the source-filter
model of speech production. Specifically, we apply differentiable resonant filters to a glottal …

Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis

CY Yu, G Fazekas - arXiv preprint arXiv:2406.05128, 2024 - arxiv.org
Training the linear prediction (LP) operator end-to-end for audio synthesis in modern deep
learning frameworks is slow due to its recursive formulation. In addition, frame-wise …

[HTML][HTML] GOLF: A Singing Voice Synthesiser with Glottal Flow Wavetables and LPC Filters

CY Yu, G Fazekas - … of the International Society for Music …, 2024 - transactions.ismir.net
This paper introduces GlOttal‑flow LPC Filter (GOLF), a novel method for singing voice
synthesis (SVS) that exploits the physical characteristics of the human voice using …

Teaching Speech Signal Processing Fundamentals in Undergraduate Class Using an Interactive GUI

R Rajan, ARA Mahadev, P Arjun… - 2024 32nd European …, 2024 - ieeexplore.ieee.org
This paper introduces an interactive GUI to teach speech signal processing fundamentals to
undergraduate students. Traditional teaching methods often struggle to convey complex …

Incorporating Cumulative Mean Normalized Difference Function Towards Intepretable Monophonic Singing Voice Pitch Extraction

X Li, C He - 2024 9th International Conference on Intelligent …, 2024 - ieeexplore.ieee.org
Pitch estimation plays an important role in various music processing and music information
retrieval applications. The traditional methods for pitch estimation contain rich prior …

[PDF][PDF] Speech wave-form Driven Motion Synthesis For Embodied Agents

JH Lu - 2023 - core.ac.uk
The main objective of this thesis is to synthesise motion from speech, especially in
conversation. Based on previous research into different acoustic features or the combination …