Understanding the role of self attention for efficient speech recognition

K Shim, J Choi, W Sung - International Conference on Learning …, 2022 - openreview.net
Self-attention (SA) is a critical component of Transformer neural networks that have
succeeded in automatic speech recognition (ASR). In this paper, we analyze the role of SA …

A study of transducer based end-to-end ASR with ESPnet: Architecture, auxiliary loss and decoding strategies

F Boyer, Y Shinohara, T Ishii… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
In this study, we present recent developments of models trained with the RNN-T loss in
ESPnet. It involves the use of various archi-tectures such as recently proposed Conformer …

Speech representation learning combining conformer cpc with deep cluster for the zerospeech challenge 2021

T Maekaku, X Chang, Y Fujita, LW Chen… - arXiv preprint arXiv …, 2021 - arxiv.org
We present a system for the Zero Resource Speech Challenge 2021, which combines a
Contrastive Predictive Coding (CPC) with deep cluster. In deep cluster, we first prepare …

Updating only encoders prevents catastrophic forgetting of end-to-end ASR models

Y Takashima, S Horiguchi, S Watanabe… - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper, we present an incremental domain adaptation technique to prevent
catastrophic forgetting for an end-to-end automatic speech recognition (ASR) model …

Decoupling recognition and transcription in mandarin asr

J Yuan, X Cai, D Gao, R Zheng… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end
approach. Unlike English where the writing system is closely related to sound, Chinese …

Local or global? A novel transformer for Chinese named entity recognition based on multi-view and sliding attention

Y Wang, L Lu, W Yang, Y Chen - International Journal of Machine …, 2024 - Springer
Transformer is widely used in natural language processing (NLP) tasks due to the parallel
and modeling of long texts. However, its performance in Chinese named entity recognition …

[HTML][HTML] MPSA-Conformer-CTC/Attention: A High-Accuracy, Low-Complexity End-to-End Approach for Tibetan Speech Recognition

C Wu, H Sun, K Huang, L Wu - Sensors, 2024 - mdpi.com
This study addresses the challenges of low accuracy and high computational demands in
Tibetan speech recognition by investigating the application of end-to-end networks. We …

使用Conformer 增强的混合CTC/Attention 端到端中文语音识别.

陈戈, 谢旭康, 孙俊, 陈祺东 - Journal of Computer …, 2023 - search.ebscohost.com
最近, 基于自注意力的Transformer 结构在不同领域的一系列任务上表现出非常好的性能.
探索了基于Transformer 编码器和LAS (listen, attend and spell) 解码器的Transformer-LAS …

On the compression of shallow non-causal ASR models using knowledge distillation and tied-and-reduced decoder for low-latency on-device speech recognition

N Adiga, J Park, CS Kumar, S Singh, K Lee… - arXiv preprint arXiv …, 2023 - arxiv.org
Recently, the cascaded two-pass architecture has emerged as a strong contender for on-
device automatic speech recognition (ASR). A cascade of causal and shallow non-causal …

[引用][C] 多模态人机交互综述

陶建华, 巫英才, 喻纯, 翁冬冬, 李冠君, 韩腾, 王运涛… - 2022 - 中国图象图形学报