Unigram-normalized perplexity as a language model performance measure with different vocabulary...

Z Du, A Zeng, Y Dong, J Tang - arXiv preprint arXiv:2403.15796, 2024 - arxiv.org

Recent studies have put into question the belief that emergent abilities in language models
are exclusive to large models. This skepticism arises from two observations: 1) smaller …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

How conservative are language models? adapting to the introduction of gender-neutral pronouns

S Brandl, R Cui, A Søgaard - arXiv preprint arXiv:2204.10281, 2022 - arxiv.org

Gender-neutral pronouns have recently been introduced in many languages to a) include
non-binary people and b) as a generic singular. Recent results from psycholinguistics …

被引用次数：26 相关文章所有 6 个版本

[PDF] arxiv.org

Scaling laws with vocabulary: Larger models deserve larger vocabularies

C Tao, Q Liu, L Dou, N Muennighoff, Z Wan… - arXiv preprint arXiv …, 2024 - arxiv.org

Research on scaling large language models (LLMs) has primarily focused on model
parameters and training data size, overlooking the role of vocabulary size.% Intuitively …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

TextGram: Towards a better domain-adaptive pretraining

S Hiwarkhedkar, S Mittal, V Magdum… - … Conference on Speech …, 2023 - Springer

For green AI, it is crucial to measure and reduce the carbon footprint emitted during the
training of large language models. In NLP, performing pre-training on Transformer models …

Hierarchy and flexibility in caenorhabditis elegans foraging

S Gupta - 2021 - digital.csic.es

[EN] Foraging is an ecologically relevant and evolutionary ancient behavior underlying
some of the most important decisions made by all animals. Foraging in the nematode worm …

高级搜索

QQ 群