Comparison of decoding strategies for ctc acoustic models

E Variani, D Rybach, C Allauzen… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a
time-synchronous encoder-decoder model that preserves the modularity of conventional …

被引用次数：170 相关文章所有 6 个版本

A unified framework for multilingual speech recognition in air traffic control systems

Y Lin, D Guo, J Zhang, Z Chen… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

This work focuses on robust speech recognition in air traffic control (ATC) by designing a
novel processing paradigm to integrate multilingual speech recognition into a single …

被引用次数：72 相关文章所有 4 个版本

[PDF] nature.com

An automatic assessment system for Alzheimer's disease based on speech using feature sequence generator and recurrent neural network

YW Chien, SY Hong, WT Cheah, LH Yao, YL Chang… - Scientific Reports, 2019 - nature.com

Alzheimer disease and other dementias have become the 7th cause of death worldwide.
Still lacking a cure, an early detection of the disease in order to provide the best intervention …

被引用次数：68 相关文章所有 8 个版本

[PDF] google.com

DECN: Dialogical emotion correction network for conversational emotion recognition

Z Lian, B Liu, J Tao - Neurocomputing, 2021 - Elsevier

Emotion recognition in conversation (ERC) is an important research topic in artificial
intelligence. Different from the emotion estimation in individual utterances, ERC requires …

被引用次数：40 相关文章所有 2 个版本

[PDF] mdpi.com

Mispronunciation detection and diagnosis with articulatory-level feedback generation for non-native arabic speech

M Algabri, H Mathkour, M Alsulaiman, MA Bencherif - Mathematics, 2022 - mdpi.com

A high-performance versatile computer-assisted pronunciation training (CAPT) system that
provides the learner immediate feedback as to whether their pronunciation is correct is very …

被引用次数：24 相关文章所有 7 个版本

[PDF] arxiv.org

Hierarchical multitask learning with ctc

R Sanabria, F Metze - 2018 IEEE Spoken Language …, 2018 - ieeexplore.ieee.org

In Automatic Speech Recognition, it is still challenging to learn useful intermediate
representations when using high-level (or abstract) target units such as words. For that …

被引用次数：74 相关文章所有 4 个版本

[PDF] arxiv.org

Dialog-context aware end-to-end speech recognition

S Kim, F Metze - 2018 IEEE Spoken Language Technology …, 2018 - ieeexplore.ieee.org

Existing speech recognition systems are typically built at the sentence level, although it is
known that dialog context, eg higher-level knowledge that spans across sentences or …

被引用次数：51 相关文章所有 3 个版本

[PDF] arxiv.org

ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems

Y Lin, B Yang, L Li, D Guo, J Zhang, H Chen… - Applied Soft …, 2021 - Elsevier

In this paper, a multilingual end-to-end framework, called ATCSpeechNet, is proposed to
tackle the issue of translating communication speech into human-readable text in air traffic …

被引用次数：35 相关文章所有 4 个版本

[PDF] isca-archive.org

[PDF][PDF] Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition.

S Zhang, M Lei, Z Yan - Interspeech, 2019 - isca-archive.org

Abstract Connectionist Temporal Classification (CTC) based end-to-end speech recognition
system usually need to incorporate an external language model by using WFST-based …

被引用次数：40 相关文章所有 4 个版本

[HTML] sciencedirect.com

[HTML][HTML] Late multimodal fusion for image and audio music transcription

M Alfaro-Contreras, JJ Valero-Mas, JM Iñesta… - Expert Systems with …, 2023 - Elsevier

Music transcription, which deals with the conversion of music sources into a structured
digital format, is a key problem for Music Information Retrieval (MIR). When addressing this …

被引用次数：18 相关文章所有 8 个版本

高级搜索

QQ 群