Introduction to Transformers: an NLP Perspective

T Xiao, J Zhu - arXiv preprint arXiv:2311.17633, 2023 - arxiv.org
Transformers have dominated empirical machine learning models of natural language
processing. In this paper, we introduce basic concepts of Transformers and present key …

Rethinking and improving multi-task learning for end-to-end speech translation

Y Zhang, C Xu, B Li, H Chen, T Xiao, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Significant improvements in end-to-end speech translation (ST) have been achieved
through the application of multi-task learning. However, the extent to which auxiliary tasks …

基于多尺度建模的端到端自动语音识别方法(An End-to-End Automatic Speech Recognition Method Based on Multiscale Modeling)

H Chen, R Zhang, Y Zhang, C Gao, C Xu… - Proceedings of the …, 2023 - aclanthology.org
Abstract “近年来, 基于深度学习的端到端自动语音识别模型直接对语音和文本进行建模,
结构简单且性能上也具有显著优势, 逐渐成为主流. 然而, 由于连续的语音信号与离散的文本在 …

Soft Alignment of Modality Space for End-to-End Speech Translation

Y Zhang, K Kou, B Li, C Xu, C Zhang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
End-to-end Speech Translation (ST) aims to convert speech into target text within a unified
model. The inherent differences between speech and text modalities often impede effective …

Efficient Speech-to-Text Translation: Progressive Pruning for Accelerated Speech Pre-trained Model

N Chen, Y Wang, X Su, F Bao - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Recently, speech pre-trained models based on the Transformer architecture have become
very popular for speech-to-text translation tasks. However, computing representation outputs …

Improving End-to-End Speech Translation with Progressive Dual Encoding

R Zhang, S Chen, Y Zhang, Y Du, H Chen… - … Conference on Natural …, 2024 - Springer
In end-to-end speech translation (E2E ST), multi-task learning is often applied due to the
scarcity of labeled ST data. However, the modality gap between speech and source text …