Deep Speech Synthesis from MRI-Based Articulatory Representations

P Wu, T Li, Y Lu, Y Zhang, J Lian, AW Black… - arXiv preprint arXiv …, 2023 - arxiv.org
In this paper, we study articulatory synthesis, a speech synthesis method using human vocal
tract information that offers a way to develop efficient, generalizable and interpretable …

Audio–visual deepfake detection using articulatory representation learning

Y Wang, H Huang - Computer Vision and Image Understanding, 2024 - Elsevier
Advancements in generative artificial intelligence have made it easier to manipulate auditory
and visual elements, highlighting the critical need for robust audio–visual deepfake …

Optimizing the ultrasound tongue image representation for residual network-based articulatory-to-acoustic mapping

TG Csapó, G Gosztolya, L Tóth, AH Shandiz, A Markó - Sensors, 2022 - mdpi.com
Within speech processing, articulatory-to-acoustic mapping (AAM) methods can apply
ultrasound tongue imaging (UTI) as an input.(Micro) convex transducers are mostly used …

ArtSpeech: Adaptive Text-to-Speech Synthesis with Articulatory Representations

Z Wang, Y Wang, M Li, H Huang - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
We devise an articulatory representation-based text-to-speech (TTS) model, ArtSpeech, an
explainable and effective network for human-like speech synthesis, by revisiting the sound …

Speech synthesis from three-axis accelerometer signals using conformer-based deep neural network

J Kwon, J Hwang, JE Sung, CH Im - Computers in Biology and Medicine, 2024 - Elsevier
Silent speech interfaces (SSIs) have emerged as innovative non-acoustic communication
methods, and our previous study demonstrated the significant potential of three-axis …

[PDF][PDF] Speech Synthesis from Articulatory Movements Recorded by Real-time MRI

Y Otani, S Sawada, H Ohmura… - Proceedings of the …, 2023 - isca-archive.org
Previous speech synthesis models from articulatory movements recorded using real-time
MRI (rtMRI) only predicted vocal tract shape parameters and required additional pitch …

Neural speaker embeddings for ultrasound-based silent speech interfaces

AH Shandiz, L Tóth, G Gosztolya, A Markó… - arXiv preprint arXiv …, 2021 - arxiv.org
Articulatory-to-acoustic mapping seeks to reconstruct speech from a recording of the
articulatory movements, for example, an ultrasound video. Just like speech signals, these …

[PDF][PDF] A data-driven model of acoustic speech intelligibility for optimization-based models of speech production

B Elie, J Šimko, A Turk - Proceedings of Interspeech, 2024 - helda.helsinki.fi
This paper presents a data-driven model of intelligibility which is intended to be used in an
optimization-based model of speech production. The BiLSTM-based model is trained as a …

Adaptation of tongue ultrasound-based silent speech interfaces using spatial transformer networks

L Tóth, AH Shandiz, G Gosztolya, CT Gábor - arXiv preprint arXiv …, 2023 - arxiv.org
Thanks to the latest deep learning algorithms, silent speech interfaces (SSI) are now able to
synthesize intelligible speech from articulatory movement data under certain conditions …

Lip2Speech: lightweight multi-speaker speech reconstruction with Gabor features

Z Dong, Y Xu, A Abel, D Wang - Applied Sciences, 2024 - mdpi.com
In environments characterised by noise or the absence of audio signals, visual cues, notably
facial and lip movements, serve as valuable substitutes for missing or corrupted speech …