Phoneme-to-audio alignment with recurrent neural networks for speaking and singing voice

Y Teytaut, A Roebel - Proceedings of Interspeech 2021, 2021 - hal.science
Phoneme-to-audio alignment is the task of synchronizing voice recordings and their related
phonetic transcripts. In this work, we introduce a new system to forced phonetic alignment …

Joint phoneme alignment and text-informed speech separation on highly corrupted speech

K Schulze-Forster, CSJ Doire… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Speech separation quality can be improved by exploiting textual information. However, this
usually requires text-to-speech alignment at phoneme level. Classical alignment methods …

Semisupervised Speech Data Extraction from Basque Parliament Sessions and Validation on Fully Bilingual Basque–Spanish ASR

M Penagarikano, A Varona, G Bordel… - Applied Sciences, 2023 - mdpi.com
In this paper, a semisupervised speech data extraction method is presented and applied to
create a new dataset designed for the development of fully bilingual Automatic Speech …

A linear memory CTC-based algorithm for text-to-voice alignment of very long audio recordings

G Doras, Y Teytaut, A Roebel - Applied Sciences, 2023 - mdpi.com
Synchronisation of a voice recording with the corresponding text is a common task in speech
and music processing, and is used in many practical applications (automatic subtitling …

Iterative pseudo-forced alignment by acoustic ctc loss for self-supervised asr domain adaptation

F López, J Luque - arXiv preprint arXiv:2210.15226, 2022 - arxiv.org
High-quality data labeling from specific domains is costly and human time-consuming. In this
work, we propose a self-supervised domain adaptation method, based upon an iterative …

Sub-sync: Automatic synchronization of subtitles in the broadcasting of true live programs in spanish

I González-Carrasco, L Puente, B Ruiz-Mezcua… - IEEE …, 2019 - ieeexplore.ieee.org
Individuals with sensory impairment (hearing or visual) encounter serious communication
barriers within society and the world around them. These barriers hinder the communication …

A Bilingual Basque–Spanish Dataset of Parliamentary Sessions for the Development and Evaluation of Speech Technology

A Varona, M Penagarikano, G Bordel… - Applied Sciences, 2024 - mdpi.com
The development of speech technology requires large amounts of data to estimate the
underlying models. Even when relying on large multilingual pre-trained models, some …

[PDF][PDF] Semisupervised training of a fully bilingual ASR system for Basque and Spanish

M Penagarikano, A Varona, G Bordel… - Proceedings of the …, 2022 - researchgate.net
Automatic speech recognition (ASR) of speech signals with code-switching (an abrupt
language change common in bilingual communities) typically requires spoken language …

Research on Chinese audio and text alignment algorithm based on AIC-FCM and Doc2Vec

K Chen, J Huang, Y Cui, W Ren - ACM Transactions on Asian and Low …, 2023 - dl.acm.org
''Audiobook” is a multimedia-based reading technology that has emerged in recent years.
Realizing the alignment of e-book text and book audio is the most important part of its …

[PDF][PDF] Text-To-Speech Synthesizer for English, Hindi and Marathi Spoken Signals‖

GD Ramteke, RJ Ramteke - at British Journal of Applied Science …, 2016 - researchgate.net
The paper proposes a model of Text-To-Speech (TTS) engine for Marathi, Hindi and English
languages. The characters and their representation are analyzed and synthesized with the …