Training machine learning models in a meaningful order, from the easy samples to the hard ones, using curriculum learning can provide performance improvements over the standard …
Y Xiao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to speed up inference, has attracted much attention in both machine learning and …
Abstract Knowledge distillation (KD) is the preliminary step for training non-autoregressive translation (NAT) models, which eases the training of NAT models at the cost of losing …
Recent work on non-autoregressive neural machine translation (NAT) aims at improving the efficiency by parallel decoding without sacrificing the quality. However, existing NAT …
F Li, J Chen, X Zhang - Electronics, 2023 - mdpi.com
Non-autoregressive neural machine translation (NAMT) has received increasing attention recently in virtue of its promising acceleration paradigm for fast decoding. However, these …
Knowledge distillation (KD) is commonly used to construct synthetic data for training non- autoregressive translation (NAT) models. However, there exists a discrepancy on low …
Y Leng, X Tan, L Zhu, J Xu, R Luo… - Advances in …, 2021 - proceedings.neurips.cc
Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER) than original …
J Guo, Z Zhang, L Xu, HR Wei… - Advances in Neural …, 2020 - proceedings.neurips.cc
While large scale pre-trained language models such as BERT have achieved great success on various natural language understanding tasks, how to efficiently and effectively …
J Guo, L Xu, E Chen - Proceedings of the 58th Annual Meeting of …, 2020 - aclanthology.org
The masked language model has received remarkable attention due to its effectiveness on various natural language processing tasks. However, few works have adopted this …