Why neural machine translation prefers empty outputs

J Libovický, H Schmid, A Fraser - arXiv preprint arXiv:2110.08191, 2021 - arxiv.org

We present a literature and empirical survey that critically assesses the state of the art in
character-level modeling for machine translation (MT). Despite evidence in the literature that …

被引用次数：34 相关文章所有 11 个版本

[PDF] arxiv.org

What Do You Get When You Cross Beam Search with Nucleus Sampling?

U Shaham, O Levy - arXiv preprint arXiv:2107.09729, 2021 - arxiv.org

We combine beam search with the probabilistic pruning technique of nucleus sampling to
create two deterministic nucleus search algorithms for natural language generation. The first …

被引用次数：17 相关文章所有 5 个版本

[PDF] arxiv.org

MAP's not dead yet: Uncovering true language model modes by conditioning away degeneracy

D Yoshida, K Goyal, K Gimpel - arXiv preprint arXiv:2311.08817, 2023 - arxiv.org

It has been widely observed that exact or approximate MAP (mode-seeking) decoding from
natural language generation (NLG) models consistently leads to degenerate outputs …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Characterizing and addressing the issue of oversmoothing in neural autoregressive sequence modeling

I Kulikov, M Eremeev, K Cho - arXiv preprint arXiv:2112.08914, 2021 - arxiv.org

Neural autoregressive sequence models smear the probability among many possible
sequences including degenerate ones, such as empty or repetitive sequences. In this work …

被引用次数：6 相关文章所有 3 个版本

[PDF] aclanthology.org

The Effect of Generalisation on the Inadequacy of the Mode

B Eikema - Proceedings of the 1st Workshop on Uncertainty …, 2024 - aclanthology.org

The highest probability sequences of most neural language generation models tend to be
degenerate in some way, a problem known as the inadequacy of the mode. While many …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Mode recovery in neural autoregressive sequence modeling

I Kulikov, S Welleck, K Cho - arXiv preprint arXiv:2106.05459, 2021 - arxiv.org

Despite its wide use, recent studies have revealed unexpected and undesirable properties
of neural autoregressive sequence models trained with maximum likelihood, such as an …

被引用次数：4 相关文章所有 5 个版本

[PDF] washington.edu

Interpretation Errors: Extracting Functionality From Generative Models of Language by Understanding Them Better

A Holtzman - 2023 - search.proquest.com

The rise of large language models as the workhorse of NLP, and the continuous release of
better models (OpenAI, 2023; Pichai, 2023; Schulman et al., 2022, inter alia) has created a …

Empirical Analysis of Beam Search Curse and Search Errors with Model Errors in Neural Machine Translation

J He, S Sun, X Jia, W Li - … of the 24th Annual Conference of the …, 2023 - aclanthology.org

Beam search is the most popular decoding method for Neural Machine Translation (NMT)
and is still a strong baseline compared with the newly proposed sampling-based methods …

被引用次数：1 相关文章所有 3 个版本

[PDF] uoa.gr

[PDF][PDF] Tradutorium: An offline cross-platform Machine Translation application

ST Stravoravdis - 2024 - pergamos.lib.uoa.gr

Machine translation has evolved significantly since the field's creation. In the last 30 years
the output quality has improved, at first with the use of statistical techniques and later neural …

[PDF] arxiv.org

The Implicit Length Bias of Label Smoothing on Beam Search Decoding

B Liang, P Wang, Y Cao - arXiv preprint arXiv:2205.00659, 2022 - arxiv.org

Label smoothing is ubiquitously applied in Neural Machine Translation (NMT) training.
While label smoothing offers a desired regularization effect during model training, in this …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群