Reconstructing speech from real-time articulatory MRI using neural vocoders

P Wu, T Li, Y Lu, Y Zhang, J Lian, AW Black… - arXiv preprint arXiv …, 2023 - arxiv.org

In this paper, we study articulatory synthesis, a speech synthesis method using human vocal
tract information that offers a way to develop efficient, generalizable and interpretable …

被引用次数：20 相关文章所有 9 个版本

Audio–visual deepfake detection using articulatory representation learning

Y Wang, H Huang - Computer Vision and Image Understanding, 2024 - Elsevier

Advancements in generative artificial intelligence have made it easier to manipulate auditory
and visual elements, highlighting the critical need for robust audio–visual deepfake …

被引用次数：3 相关文章所有 2 个版本

[PDF] mdpi.com

Optimizing the ultrasound tongue image representation for residual network-based articulatory-to-acoustic mapping

TG Csapó, G Gosztolya, L Tóth, AH Shandiz, A Markó - Sensors, 2022 - mdpi.com

Within speech processing, articulatory-to-acoustic mapping (AAM) methods can apply
ultrasound tongue imaging (UTI) as an input.(Micro) convex transducers are mostly used …

被引用次数：12 相关文章所有 10 个版本

[PDF] openreview.net

ArtSpeech: Adaptive Text-to-Speech Synthesis with Articulatory Representations

Z Wang, Y Wang, M Li, H Huang - Proceedings of the 32nd ACM …, 2024 - dl.acm.org

We devise an articulatory representation-based text-to-speech (TTS) model, ArtSpeech, an
explainable and effective network for human-like speech synthesis, by revisiting the sound …

被引用次数：1 相关文章所有 2 个版本

Speech synthesis from three-axis accelerometer signals using conformer-based deep neural network

J Kwon, J Hwang, JE Sung, CH Im - Computers in Biology and Medicine, 2024 - Elsevier

Silent speech interfaces (SSIs) have emerged as innovative non-acoustic communication
methods, and our previous study demonstrated the significant potential of three-axis …

[PDF][PDF] Speech Synthesis from Articulatory Movements Recorded by Real-time MRI

Y Otani, S Sawada, H Ohmura… - Proceedings of the …, 2023 - isca-archive.org

Previous speech synthesis models from articulatory movements recorded using real-time
MRI (rtMRI) only predicted vocal tract shape parameters and required additional pitch …

被引用次数：10 相关文章所有 3 个版本

[PDF] arxiv.org

Neural speaker embeddings for ultrasound-based silent speech interfaces

AH Shandiz, L Tóth, G Gosztolya, A Markó… - arXiv preprint arXiv …, 2021 - arxiv.org

Articulatory-to-acoustic mapping seeks to reconstruct speech from a recording of the
articulatory movements, for example, an ultrasound video. Just like speech signals, these …

被引用次数：8 相关文章所有 8 个版本

[PDF] helsinki.fi

[PDF][PDF] A data-driven model of acoustic speech intelligibility for optimization-based models of speech production

B Elie, J Šimko, A Turk - Proceedings of Interspeech, 2024 - helda.helsinki.fi

This paper presents a data-driven model of intelligibility which is intended to be used in an
optimization-based model of speech production. The BiLSTM-based model is trained as a …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

Adaptation of tongue ultrasound-based silent speech interfaces using spatial transformer networks

L Tóth, AH Shandiz, G Gosztolya, CT Gábor - arXiv preprint arXiv …, 2023 - arxiv.org

Thanks to the latest deep learning algorithms, silent speech interfaces (SSI) are now able to
synthesize intelligible speech from articulatory movement data under certain conditions …

被引用次数：4 相关文章所有 6 个版本

[PDF] mdpi.com

Lip2Speech: lightweight multi-speaker speech reconstruction with Gabor features

Z Dong, Y Xu, A Abel, D Wang - Applied Sciences, 2024 - mdpi.com

In environments characterised by noise or the absence of audio signals, visual cues, notably
facial and lip movements, serve as valuable substitutes for missing or corrupted speech …

高级搜索

QQ 群