Bridging the granularity gap for acoustic modeling

T Xiao, J Zhu - arXiv preprint arXiv:2311.17633, 2023 - arxiv.org

Transformers have dominated empirical machine learning models of natural language
processing. In this paper, we introduce basic concepts of Transformers and present key …

被引用次数：20 相关文章所有 4 个版本

[PDF] arxiv.org

Rethinking and improving multi-task learning for end-to-end speech translation

Y Zhang, C Xu, B Li, H Chen, T Xiao, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Significant improvements in end-to-end speech translation (ST) have been achieved
through the application of multi-task learning. However, the extent to which auxiliary tasks …

被引用次数：6 相关文章所有 5 个版本

[PDF] aclanthology.org

基于多尺度建模的端到端自动语音识别方法(An End-to-End Automatic Speech Recognition Method Based on Multiscale Modeling)

H Chen, R Zhang, Y Zhang, C Gao, C Xu… - Proceedings of the …, 2023 - aclanthology.org

Abstract “近年来, 基于深度学习的端到端自动语音识别模型直接对语音和文本进行建模,
结构简单且性能上也具有显著优势, 逐渐成为主流. 然而, 由于连续的语音信号与离散的文本在 …

Soft Alignment of Modality Space for End-to-End Speech Translation

Y Zhang, K Kou, B Li, C Xu, C Zhang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

End-to-end Speech Translation (ST) aims to convert speech into target text within a unified
model. The inherent differences between speech and text modalities often impede effective …

被引用次数：2 相关文章所有 3 个版本

Efficient Speech-to-Text Translation: Progressive Pruning for Accelerated Speech Pre-trained Model

N Chen, Y Wang, X Su, F Bao - 2024 IEEE International …, 2024 - ieeexplore.ieee.org

Recently, speech pre-trained models based on the Transformer architecture have become
very popular for speech-to-text translation tasks. However, computing representation outputs …

Improving End-to-End Speech Translation with Progressive Dual Encoding

R Zhang, S Chen, Y Zhang, Y Du, H Chen… - … Conference on Natural …, 2024 - Springer

In end-to-end speech translation (E2E ST), multi-task learning is often applied due to the
scarcity of labeled ST data. However, the modality gap between speech and source text …

高级搜索

QQ 群