Improving rnn transducer with normalized jointer network

K Shim, J Choi, W Sung - International Conference on Learning …, 2022 - openreview.net

Self-attention (SA) is a critical component of Transformer neural networks that have
succeeded in automatic speech recognition (ASR). In this paper, we analyze the role of SA …

被引用次数：55 相关文章所有 2 个版本

[PDF] arxiv.org

A study of transducer based end-to-end ASR with ESPnet: Architecture, auxiliary loss and decoding strategies

F Boyer, Y Shinohara, T Ishii… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

In this study, we present recent developments of models trained with the RNN-T loss in
ESPnet. It involves the use of various archi-tectures such as recently proposed Conformer …

被引用次数：40 相关文章所有 5 个版本

[PDF] arxiv.org

Speech representation learning combining conformer cpc with deep cluster for the zerospeech challenge 2021

T Maekaku, X Chang, Y Fujita, LW Chen… - arXiv preprint arXiv …, 2021 - arxiv.org

We present a system for the Zero Resource Speech Challenge 2021, which combines a
Contrastive Predictive Coding (CPC) with deep cluster. In deep cluster, we first prepare …

被引用次数：15 相关文章所有 9 个版本

[PDF] arxiv.org

Updating only encoders prevents catastrophic forgetting of end-to-end ASR models

Y Takashima, S Horiguchi, S Watanabe… - arXiv preprint arXiv …, 2022 - arxiv.org

In this paper, we present an incremental domain adaptation technique to prevent
catastrophic forgetting for an end-to-end automatic speech recognition (ASR) model …

被引用次数：8 相关文章所有 8 个版本

[PDF] arxiv.org

Decoupling recognition and transcription in mandarin asr

J Yuan, X Cai, D Gao, R Zheng… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end
approach. Unlike English where the writing system is closely related to sound, Chinese …

被引用次数：11 相关文章所有 3 个版本

[PDF] ssrn.com

Local or global? A novel transformer for Chinese named entity recognition based on multi-view and sliding attention

Y Wang, L Lu, W Yang, Y Chen - International Journal of Machine …, 2024 - Springer

Transformer is widely used in natural language processing (NLP) tasks due to the parallel
and modeling of long texts. However, its performance in Chinese named entity recognition …

被引用次数：4 相关文章所有 2 个版本

[HTML] mdpi.com

[HTML][HTML] MPSA-Conformer-CTC/Attention: A High-Accuracy, Low-Complexity End-to-End Approach for Tibetan Speech Recognition

C Wu, H Sun, K Huang, L Wu - Sensors, 2024 - mdpi.com

This study addresses the challenges of low accuracy and high computational demands in
Tibetan speech recognition by investigating the application of end-to-end networks. We …

使用Conformer 增强的混合CTC/Attention 端到端中文语音识别.

陈戈，谢旭康，孙俊，陈祺东 - Journal of Computer …, 2023 - search.ebscohost.com

最近, 基于自注意力的Transformer 结构在不同领域的一系列任务上表现出非常好的性能.
探索了基于Transformer 编码器和LAS (listen, attend and spell) 解码器的Transformer-LAS …

被引用次数：2 相关文章

[PDF] arxiv.org

On the compression of shallow non-causal ASR models using knowledge distillation and tied-and-reduced decoder for low-latency on-device speech recognition

N Adiga, J Park, CS Kumar, S Singh, K Lee… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, the cascaded two-pass architecture has emerged as a strong contender for on-
device automatic speech recognition (ASR). A cascade of causal and shallow non-causal …

[引用][C] 多模态人机交互综述

陶建华，巫英才，喻纯，翁冬冬，李冠君，韩腾，王运涛… - 2022 - 中国图象图形学报

被引用次数：6 相关文章所有 5 个版本

高级搜索

QQ 群