The principal task in supervised neural machine translation (NMT) is to learn to generate target sentences conditioned on the source inputs from a set of parallel sentence pairs, and …
B Li, Z Wang, H Liu, Y Jiang, Q Du, T Xiao… - arXiv preprint arXiv …, 2020 - arxiv.org
Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but training an extremely deep encoder is time consuming. Moreover, why …
B Li, Z Wang, H Liu, Q Du, T Xiao, C Zhang… - Proceedings of the AAAI …, 2021 - ojs.aaai.org
Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory …
B Li, Q Du, T Zhou, Y Jing, S Zhou, X Zeng… - arXiv preprint arXiv …, 2022 - arxiv.org
Residual networks are an Euler discretization of solutions to Ordinary Differential Equations (ODE). This paper explores a deeper relationship between Transformer and numerical ODE …
Y Li, J Li, M Zhang - Knowledge-based systems, 2021 - Elsevier
Most of the deep neural machine translation (NMT) models are based on a bottom-up feedforward fashion, in which representations in low layers construct or modulate high …
J Yang, Y Yin, L Yang, S Ma, H Huang… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
Transformer structure, stacked by a sequence of encoder and decoder network layers, achieves significant development in neural machine translation. However, vanilla …
Neural machine translation (NMT) has achieved great success due to the ability to generate high-quality sentences. Compared with human translations, one of the drawbacks of current …
An end-to-end speech-to-text (S2T) translation model is usually initialized from a pre-trained speech recognition encoder and a pre-trained text-to-text (T2T) translation decoder …
The Transformer model has achieved state-of-the-art performance in many sequence modeling tasks. However, how to leverage model capacity with large or variable depths is …