Understanding emergent abilities of language models from the loss perspective

Z Du, A Zeng, Y Dong, J Tang - arXiv preprint arXiv:2403.15796, 2024 - arxiv.org
Recent studies have put into question the belief that emergent abilities in language models
are exclusive to large models. This skepticism arises from two observations: 1) smaller …

How conservative are language models? adapting to the introduction of gender-neutral pronouns

S Brandl, R Cui, A Søgaard - arXiv preprint arXiv:2204.10281, 2022 - arxiv.org
Gender-neutral pronouns have recently been introduced in many languages to a) include
non-binary people and b) as a generic singular. Recent results from psycholinguistics …

Scaling laws with vocabulary: Larger models deserve larger vocabularies

C Tao, Q Liu, L Dou, N Muennighoff, Z Wan… - arXiv preprint arXiv …, 2024 - arxiv.org
Research on scaling large language models (LLMs) has primarily focused on model
parameters and training data size, overlooking the role of vocabulary size.% Intuitively …

TextGram: Towards a better domain-adaptive pretraining

S Hiwarkhedkar, S Mittal, V Magdum… - … Conference on Speech …, 2023 - Springer
For green AI, it is crucial to measure and reduce the carbon footprint emitted during the
training of large language models. In NLP, performing pre-training on Transformer models …

Hierarchy and flexibility in caenorhabditis elegans foraging

S Gupta - 2021 - digital.csic.es
[EN] Foraging is an ecologically relevant and evolutionary ancient behavior underlying
some of the most important decisions made by all animals. Foraging in the nematode worm …