R Zhang, S Chen,
Y Zhang,
Y Du, H Chen… - … Conference on Natural …, 2024 - Springer
In end-to-end speech translation (E2E ST), multi-task learning is often applied due to the
scarcity of labeled ST data. However, the modality gap between speech and source text …