U Shaham, O Levy - arXiv preprint arXiv:2107.09729, 2021 - arxiv.org
We combine beam search with the probabilistic pruning technique of nucleus sampling to create two deterministic nucleus search algorithms for natural language generation. The first …
It has been widely observed that exact or approximate MAP (mode-seeking) decoding from natural language generation (NLG) models consistently leads to degenerate outputs …
Neural autoregressive sequence models smear the probability among many possible sequences including degenerate ones, such as empty or repetitive sequences. In this work …
B Eikema - Proceedings of the 1st Workshop on Uncertainty …, 2024 - aclanthology.org
The highest probability sequences of most neural language generation models tend to be degenerate in some way, a problem known as the inadequacy of the mode. While many …
Despite its wide use, recent studies have revealed unexpected and undesirable properties of neural autoregressive sequence models trained with maximum likelihood, such as an …
The rise of large language models as the workhorse of NLP, and the continuous release of better models (OpenAI, 2023; Pichai, 2023; Schulman et al., 2022, inter alia) has created a …
J He, S Sun, X Jia, W Li - … of the 24th Annual Conference of the …, 2023 - aclanthology.org
Beam search is the most popular decoding method for Neural Machine Translation (NMT) and is still a strong baseline compared with the newly proposed sampling-based methods …
Machine translation has evolved significantly since the field's creation. In the last 30 years the output quality has improved, at first with the use of statistical techniques and later neural …
B Liang, P Wang, Y Cao - arXiv preprint arXiv:2205.00659, 2022 - arxiv.org
Label smoothing is ubiquitously applied in Neural Machine Translation (NMT) training. While label smoothing offers a desired regularization effect during model training, in this …