Mixce: Training autoregressive language models by mixing forward and reverse cross-entropies

D Oralbekova, O Mamyrbayev, M Othman… - Applied Sciences, 2023 - mdpi.com

This article provides a comprehensive survey of contemporary language modeling
approaches within the realm of natural language processing (NLP) tasks. This paper …

被引用次数：23 相关文章所有 3 个版本

[PDF] aclanthology.org

[PDF][PDF] Predict the Next Word:< Humans Exhibit Uncertainty in this Task and Language Models _>

E Ilia, W Aziz - Proceedings of the 18th Conference of the …, 2024 - aclanthology.org

Abstract Language models (LMs) are statistical models trained to assign probability to
humangenerated text. As such, it is reasonable to question whether they approximate …

被引用次数：3 相关文章

[PDF] arxiv.org

EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling

S Ren, Z Wu, KQ Zhu - arXiv preprint arXiv:2310.04691, 2023 - arxiv.org

Neural language models are probabilistic models of human text. They are predominantly
trained using maximum likelihood estimation (MLE), which is equivalent to minimizing the …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Transparency at the source: Evaluating and interpreting language models with access to the true distribution

J Jumelet, W Zuidema - arXiv preprint arXiv:2310.14840, 2023 - arxiv.org

We present a setup for training, evaluating and interpreting neural language models, that
uses artificial, language-like data. The data is generated using a massive probabilistic …

被引用次数：6 相关文章所有 5 个版本

[PDF] hal.science

SinKD: Sinkhorn Distance Minimization for Knowledge Distillation

X Cui, Y Qin, Y Gao, E Zhang, Z Xu… - … on Neural Networks …, 2024 - ieeexplore.ieee.org

Knowledge distillation (KD) has been widely adopted to compress large language models
(LLMs). Existing KD methods investigate various divergence measures including the …

被引用次数：1 相关文章所有 2 个版本

[PDF] neurips.cc

Beyond MLE: convex learning for text generation

C Shao, Z Ma, M Zhang, Y Feng - Advances in Neural …, 2023 - proceedings.neurips.cc

Maximum likelihood estimation (MLE) is a statistical method used to estimate the parameters
of a probability distribution that best explain the observed data. In the context of text …

An improved two-stage zero-shot relation triplet extraction model with hybrid cross-entropy loss and discriminative reranking

D Li, L Zhang, J Zhou, J Huang, N Xiong… - Expert Systems with …, 2025 - Elsevier

Zero-shot relation triplet extraction (ZeroRTE) aims to extract relation triplets from
unstructured text under zero-shot conditions, where the relation sets in the training and …

[PDF] arxiv.org

Predict the Next Word

E Ilia, W Aziz - arXiv preprint arXiv:2402.17527, 2024 - arxiv.org

Language models (LMs) are statistical models trained to assign probability to human-
generated text. As such, it is reasonable to question whether they approximate linguistic …

Finding structure in language models

J Jumelet - arXiv preprint arXiv:2411.16433, 2024 - arxiv.org

When we speak, write or listen, we continuously make predictions based on our knowledge
of a language's grammar. Remarkably, children acquire this grammatical knowledge within …

FreStega: A Plug-and-Play Method for Boosting Imperceptibility and Capacity in Generative Linguistic Steganography for Real-World Scenarios

K Pang - arXiv preprint arXiv:2412.19652, 2024 - arxiv.org

Linguistic steganography embeds secret information in seemingly innocent texts,
safeguarding privacy in surveillance environments. Generative linguistic steganography …

高级搜索

QQ 群