MANTa: Efficient Gradient-Based Tokenization for End-to-End Robust Language Modeling N Godey, R Castagné, ÉV De La Clergerie, B Sagot Findings of the Association for Computational Linguistics: EMNLP 2022, 2859-2870, 2022 | 7* | 2022 |
Is Anisotropy Inherent to Transformers? N Godey, É de la Clergerie, B Sagot arXiv preprint arXiv:2306.07656, 2023 | 5 | 2023 |
Anisotropy Is Inherent to Self-Attention in Transformers N Godey, É de la Clergerie, B Sagot arXiv preprint arXiv:2401.12143, 2024 | 4 | 2024 |
On the Scaling Laws of Geographical Representation in Language Models N Godey, É de la Clergerie, B Sagot arXiv preprint arXiv:2402.19406, 2024 | 1 | 2024 |
Headless Language Models: Learning without Predicting with Contrastive Weight Tying N Godey, É de la Clergerie, B Sagot arXiv preprint arXiv:2309.08351, 2023 | 1 | 2023 |
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck N Godey, É de la Clergerie, B Sagot arXiv preprint arXiv:2404.07647, 2024 | | 2024 |