Why don't people use character-level machine translation?

J Libovický, H Schmid, A Fraser - arXiv preprint arXiv:2110.08191, 2021 - arxiv.org
We present a literature and empirical survey that critically assesses the state of the art in
character-level modeling for machine translation (MT). Despite evidence in the literature that …

What Do You Get When You Cross Beam Search with Nucleus Sampling?

U Shaham, O Levy - arXiv preprint arXiv:2107.09729, 2021 - arxiv.org
We combine beam search with the probabilistic pruning technique of nucleus sampling to
create two deterministic nucleus search algorithms for natural language generation. The first …

MAP's not dead yet: Uncovering true language model modes by conditioning away degeneracy

D Yoshida, K Goyal, K Gimpel - arXiv preprint arXiv:2311.08817, 2023 - arxiv.org
It has been widely observed that exact or approximate MAP (mode-seeking) decoding from
natural language generation (NLG) models consistently leads to degenerate outputs …

Characterizing and addressing the issue of oversmoothing in neural autoregressive sequence modeling

I Kulikov, M Eremeev, K Cho - arXiv preprint arXiv:2112.08914, 2021 - arxiv.org
Neural autoregressive sequence models smear the probability among many possible
sequences including degenerate ones, such as empty or repetitive sequences. In this work …

The Effect of Generalisation on the Inadequacy of the Mode

B Eikema - Proceedings of the 1st Workshop on Uncertainty …, 2024 - aclanthology.org
The highest probability sequences of most neural language generation models tend to be
degenerate in some way, a problem known as the inadequacy of the mode. While many …

Mode recovery in neural autoregressive sequence modeling

I Kulikov, S Welleck, K Cho - arXiv preprint arXiv:2106.05459, 2021 - arxiv.org
Despite its wide use, recent studies have revealed unexpected and undesirable properties
of neural autoregressive sequence models trained with maximum likelihood, such as an …

Interpretation Errors: Extracting Functionality From Generative Models of Language by Understanding Them Better

A Holtzman - 2023 - search.proquest.com
The rise of large language models as the workhorse of NLP, and the continuous release of
better models (OpenAI, 2023; Pichai, 2023; Schulman et al., 2022, inter alia) has created a …

Empirical Analysis of Beam Search Curse and Search Errors with Model Errors in Neural Machine Translation

J He, S Sun, X Jia, W Li - … of the 24th Annual Conference of the …, 2023 - aclanthology.org
Beam search is the most popular decoding method for Neural Machine Translation (NMT)
and is still a strong baseline compared with the newly proposed sampling-based methods …

[PDF][PDF] Tradutorium: An offline cross-platform Machine Translation application

ST Stravoravdis - 2024 - pergamos.lib.uoa.gr
Machine translation has evolved significantly since the field's creation. In the last 30 years
the output quality has improved, at first with the use of statistical techniques and later neural …

The Implicit Length Bias of Label Smoothing on Beam Search Decoding

B Liang, P Wang, Y Cao - arXiv preprint arXiv:2205.00659, 2022 - arxiv.org
Label smoothing is ubiquitously applied in Neural Machine Translation (NMT) training.
While label smoothing offers a desired regularization effect during model training, in this …