Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT). The community proposed specific network architectures and learning-based methods to …
F Li, J Chen, X Zhang - Electronics, 2023 - mdpi.com
Non-autoregressive neural machine translation (NAMT) has received increasing attention recently in virtue of its promising acceleration paradigm for fast decoding. However, these …
J Ye, Z Zheng, Y Bao, L Qian, M Wang - arXiv preprint arXiv:2302.10025, 2023 - arxiv.org
While diffusion models have achieved great success in generating continuous signals such as images and audio, it remains elusive for diffusion models in learning discrete sequence …
C Shao, Y Feng - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Non-autoregressive translation (NAT) models are typically trained with the cross-entropy loss, which forces the model outputs to be aligned verbatim with the target sentence and will …
Connectionist Temporal Classification (CTC) is a widely used approach for automatic speech recognition (ASR) that performs conditionally independent monotonic alignment …
Y Hao, Y Liu, L Mou - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Reinforcement learning (RL) has been widely used in text generation to alleviate the exposure bias issue or to utilize non-parallel datasets. The reward function plays an …
Non-autoregressive Transformer (NAT) is a family of text generation models, which aims to reduce the decoding latency by predicting the whole sentences in parallel. However, such …
Y Zhou, X Lin, X Zhang, M Wang, G Jiang, H Lu… - arXiv preprint arXiv …, 2023 - arxiv.org
Artificial Intelligence (AI) has achieved significant advancements in technology and research with the development over several decades, and is widely used in many areas including …
C Shao, X Wu, Y Feng - arXiv preprint arXiv:2205.14333, 2022 - arxiv.org
Non-autoregressive neural machine translation (NAT) suffers from the multi-modality problem: the source sentence may have multiple correct translations, but the loss function is …