B Geshkovski, C Letrouit… - Advances in Neural …, 2024 - proceedings.neurips.cc
Viewing Transformers as interacting particle systems, we describe the geometry of learned
representations when the weights are not time-dependent. We show that particles …