Multiscale collaborative deep models for neural machine translation

D Saunders - Journal of Artificial Intelligence Research, 2022 - jair.org

The development of deep learning techniques has allowed Neural Machine Translation
(NMT) models to become extremely powerful, given sufficient training data and training time …

被引用次数：99 相关文章所有 10 个版本

[PDF] arxiv.org

Learning to generalize to more: Continuous semantic augmentation for neural machine translation

X Wei, H Yu, Y Hu, R Weng, W Luo, J Xie… - arXiv preprint arXiv …, 2022 - arxiv.org

The principal task in supervised neural machine translation (NMT) is to learn to generate
target sentences conditioned on the source inputs from a set of parallel sentence pairs, and …

被引用次数：33 相关文章所有 4 个版本

[PDF] arxiv.org

Shallow-to-deep training for neural machine translation

B Li, Z Wang, H Liu, Y Jiang, Q Du, T Xiao… - arXiv preprint arXiv …, 2020 - arxiv.org

Deep encoders have been proven to be effective in improving neural machine translation
(NMT) systems, but training an extremely deep encoder is time consuming. Moreover, why …

被引用次数：46 相关文章所有 5 个版本

[PDF] aaai.org

Learning light-weight translation models from deep transformer

B Li, Z Wang, H Liu, Q Du, T Xiao, C Zhang… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

Recently, deep models have shown tremendous improvements in neural machine
translation (NMT). However, systems of this kind are computationally expensive and memory …

被引用次数：36 相关文章所有 7 个版本

[PDF] arxiv.org

ODE transformer: An ordinary differential equation-inspired model for sequence generation

B Li, Q Du, T Zhou, Y Jing, S Zhou, X Zeng… - arXiv preprint arXiv …, 2022 - arxiv.org

Residual networks are an Euler discretization of solutions to Ordinary Differential Equations
(ODE). This paper explores a deeper relationship between Transformer and numerical ODE …

被引用次数：22 相关文章所有 4 个版本

Deep Transformer modeling via grouping skip connection for neural machine translation

Y Li, J Li, M Zhang - Knowledge-based systems, 2021 - Elsevier

Most of the deep neural machine translation (NMT) models are based on a bottom-up
feedforward fashion, in which representations in low layers construct or modulate high …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Gtrans: Grouping and fusing transformer layers for neural machine translation

J Yang, Y Yin, L Yang, S Ma, H Huang… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org

Transformer structure, stacked by a sequence of encoder and decoder network layers,
achieves significant development in neural machine translation. However, vanilla …

被引用次数：17 相关文章所有 6 个版本

[PDF] aclanthology.org

Towards enhancing faithfulness for neural machine translation

R Weng, H Yu, X Wei, W Luo - Proceedings of the 2020 …, 2020 - aclanthology.org

Neural machine translation (NMT) has achieved great success due to the ability to generate
high-quality sentences. Compared with human translations, one of the drawbacks of current …

被引用次数：23 相关文章所有 2 个版本

[PDF] aclanthology.org

PromptST: Abstract Prompt Learning for End-to-End Speech Translation

T Yu, L Ding, X Liu, K Chen, M Zhang… - Proceedings of the …, 2023 - aclanthology.org

An end-to-end speech-to-text (S2T) translation model is usually initialized from a pre-trained
speech recognition encoder and a pre-trained text-to-text (T2T) translation decoder …

被引用次数：5 相关文章所有 2 个版本

[PDF] neurips.cc

Deep transformers with latent depth

X Li, A Cooper Stickland, Y Tang… - Advances in Neural …, 2020 - proceedings.neurips.cc

The Transformer model has achieved state-of-the-art performance in many sequence
modeling tasks. However, how to leverage model capacity with large or variable depths is …

被引用次数：22 相关文章所有 5 个版本

高级搜索

QQ 群