Hybrid autoregressive transducer (hat)

E Variani, D Rybach, C Allauzen… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a
time-synchronous encoder-decoder model that preserves the modularity of conventional …

A unified framework for multilingual speech recognition in air traffic control systems

Y Lin, D Guo, J Zhang, Z Chen… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
This work focuses on robust speech recognition in air traffic control (ATC) by designing a
novel processing paradigm to integrate multilingual speech recognition into a single …

An automatic assessment system for Alzheimer's disease based on speech using feature sequence generator and recurrent neural network

YW Chien, SY Hong, WT Cheah, LH Yao, YL Chang… - Scientific Reports, 2019 - nature.com
Alzheimer disease and other dementias have become the 7th cause of death worldwide.
Still lacking a cure, an early detection of the disease in order to provide the best intervention …

DECN: Dialogical emotion correction network for conversational emotion recognition

Z Lian, B Liu, J Tao - Neurocomputing, 2021 - Elsevier
Emotion recognition in conversation (ERC) is an important research topic in artificial
intelligence. Different from the emotion estimation in individual utterances, ERC requires …

Mispronunciation detection and diagnosis with articulatory-level feedback generation for non-native arabic speech

M Algabri, H Mathkour, M Alsulaiman, MA Bencherif - Mathematics, 2022 - mdpi.com
A high-performance versatile computer-assisted pronunciation training (CAPT) system that
provides the learner immediate feedback as to whether their pronunciation is correct is very …

Hierarchical multitask learning with ctc

R Sanabria, F Metze - 2018 IEEE Spoken Language …, 2018 - ieeexplore.ieee.org
In Automatic Speech Recognition, it is still challenging to learn useful intermediate
representations when using high-level (or abstract) target units such as words. For that …

Dialog-context aware end-to-end speech recognition

S Kim, F Metze - 2018 IEEE Spoken Language Technology …, 2018 - ieeexplore.ieee.org
Existing speech recognition systems are typically built at the sentence level, although it is
known that dialog context, eg higher-level knowledge that spans across sentences or …

ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems

Y Lin, B Yang, L Li, D Guo, J Zhang, H Chen… - Applied Soft …, 2021 - Elsevier
In this paper, a multilingual end-to-end framework, called ATCSpeechNet, is proposed to
tackle the issue of translating communication speech into human-readable text in air traffic …

[PDF][PDF] Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition.

S Zhang, M Lei, Z Yan - Interspeech, 2019 - isca-archive.org
Abstract Connectionist Temporal Classification (CTC) based end-to-end speech recognition
system usually need to incorporate an external language model by using WFST-based …

[HTML][HTML] Late multimodal fusion for image and audio music transcription

M Alfaro-Contreras, JJ Valero-Mas, JM Iñesta… - Expert Systems with …, 2023 - Elsevier
Music transcription, which deals with the conversion of music sources into a structured
digital format, is a key problem for Music Information Retrieval (MIR). When addressing this …