A theory of non-linear feature learning with one gradient step in two-layer neural networks

B Moniri, D Lee, H Hassani, E Dobriban - arXiv preprint arXiv:2310.07891, 2023 - arxiv.org
Feature learning is thought to be one of the fundamental reasons for the success of deep
neural networks. It is rigorously known that in two-layer fully-connected neural networks …

Provable multi-task representation learning by two-layer relu neural networks

L Collins, H Hassani, M Soltanolkotabi… - arXiv preprint arXiv …, 2023 - arxiv.org
Feature learning, ie extracting meaningful representations of data, is quintessential to the
practical success of neural networks trained with gradient descent, yet it is notoriously …

Learning hierarchical polynomials with three-layer neural networks

Z Wang, E Nichani, JD Lee - arXiv preprint arXiv:2311.13774, 2023 - arxiv.org
We study the problem of learning hierarchical polynomials over the standard Gaussian
distribution with three-layer neural networks. We specifically consider target functions of the …

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

JD Lee, K Oko, T Suzuki, D Wu - arXiv preprint arXiv:2406.01581, 2024 - arxiv.org
We study the problem of gradient descent learning of a single-index target function $
f_*(\boldsymbol {x})=\textstyle\sigma_*\left (\langle\boldsymbol {x},\boldsymbol …

Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks

D Beaglehole, I Mitliagkas, A Agarwala - arXiv preprint arXiv:2402.05271, 2024 - arxiv.org
Understanding the mechanisms through which neural networks extract statistics from input-
label pairs is one of the most important unsolved problems in supervised learning. Prior …

A novel domain adaptation method with physical constraints for shale gas production forecasting

L Gou, Z Yang, C Min, D Yi, X Li, B Kong - Applied Energy, 2024 - Elsevier
Effective forecasting of shale gas production is essential for optimizing exploration strategies
and guiding subsequent fracturing. However, in the new development of shale gas blocks …

Feature learning as alignment: a structural property of gradient descent in non-linear neural networks

D Beaglehole, I Mitliagkas… - Transactions on Machine …, 2024 - openreview.net
Understanding the mechanisms through which neural networks extract statistics from input-
label pairs through feature learning is one of the most important unsolved problems in …

How Does Gradient Descent Learn Features--A Local Analysis for Regularized Two-Layer Neural Networks

M Zhou, R Ge - arXiv preprint arXiv:2406.01766, 2024 - arxiv.org
The ability of learning useful features is one of the major advantages of neural networks.
Although recent works show that neural network can operate in a neural tangent kernel …

Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning

D Bu, W Huang, A Han, A Nitanda, T Suzuki… - arXiv preprint arXiv …, 2024 - arxiv.org
Transformer-based large language models (LLMs) have displayed remarkable creative
prowess and emergence capabilities. Existing empirical studies have revealed a strong …

Understanding Optimal Feature Transfer via a Fine-Grained Bias-Variance Analysis

Y Li, S Sen, B Adlam - arXiv preprint arXiv:2404.12481, 2024 - arxiv.org
In the transfer learning paradigm models learn useful representations (or features) during a
data-rich pretraining stage, and then use the pretrained representation to improve model …