Fourier circuits in neural networks: Unlocking the potential of large language models in mathematical reasoning and modular arithmetic

J Gu, C Li, Y Liang, Z Shi, Z Song… - arXiv preprint arXiv …, 2024 - openreview.net
In the evolving landscape of machine learning, a pivotal challenge lies in deciphering the
internal representations harnessed by neural networks and Transformers. Building on recent …

The evolution of statistical induction heads: In-context learning markov chains

BL Edelman, E Edelman, S Goel, E Malach… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models have the ability to generate text that mimics patterns in their inputs.
We introduce a simple Markov Chain sequence modeling task in order to study how this in …

Opening the black box of large language models: Two views on holistic interpretability

H Zhao, F Yang, H Lakkaraju, M Du - arXiv e-prints, 2024 - ui.adsabs.harvard.edu
As large language models (LLMs) grow more powerful, concerns around potential harms
like toxicity, unfairness, and hallucination threaten user trust. Ensuring beneficial alignment …

Out-of-distribution generalization via composition: a lens through induction heads in transformers

J Song, Z Xu, Y Zhong - arXiv preprint arXiv:2408.09503, 2024 - arxiv.org
Large language models (LLMs) such as GPT-4 sometimes appear to be creative, solving
novel tasks often with a few demonstrations in the prompt. These tasks require the models to …

Why do you grok? a theoretical analysis of grokking modular addition

MA Mohamadi, Z Li, L Wu, DJ Sutherland - arXiv preprint arXiv …, 2024 - arxiv.org
We present a theoretical explanation of the``grokking''phenomenon, where a model
generalizes long after overfitting, for the originally-studied problem of modular addition. First …

Interpreting grokked transformers in complex modular arithmetic

H Furuta, G Minegishi, Y Iwasawa, Y Matsuo - arXiv preprint arXiv …, 2024 - arxiv.org
Grokking has been actively explored to reveal the mystery of delayed generalization.
Identifying interpretable algorithms inside the grokked models is a suggestive hint to …

Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond

A Jeffares, A Curth, M van der Schaar - arXiv preprint arXiv:2411.00247, 2024 - arxiv.org
Deep learning sometimes appears to work in unexpected ways. In pursuit of a deeper
understanding of its surprising behaviors, we investigate the utility of a simple yet accurate …

Approaching deep learning through the spectral dynamics of weights

D Yunis, KK Patel, S Wheeler, P Savarese… - arXiv preprint arXiv …, 2024 - arxiv.org
We propose an empirical approach centered on the spectral dynamics of weights--the
behavior of singular values and vectors during optimization--to unify and clarify several …

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

N Mallinar, D Beaglehole, L Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Neural networks trained to solve modular arithmetic tasks exhibit grokking, a phenomenon
where the test accuracy starts improving long after the model achieves 100% training …

Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks

D Beaglehole, I Mitliagkas, A Agarwala - arXiv preprint arXiv:2402.05271, 2024 - arxiv.org
Understanding the mechanisms through which neural networks extract statistics from input-
label pairs is one of the most important unsolved problems in supervised learning. Prior …