Neural networks can learn representations with gradient descent

A Damian, J Lee… - Conference on Learning …, 2022 - proceedings.mlr.press
Significant theoretical work has established that in specific regimes, neural networks trained
by gradient descent behave like kernel methods. However, in practice, it is known that …

Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction

D Stöger, M Soltanolkotabi - Advances in Neural …, 2021 - proceedings.neurips.cc
Recently there has been significant theoretical progress on understanding the convergence
and generalization of gradient-based methods on nonconvex losses with overparameterized …

Local signal adaptivity: Provable feature learning in neural networks beyond kernels

S Karp, E Winston, Y Li, A Singh - Advances in Neural …, 2021 - proceedings.neurips.cc
Neural networks have been shown to outperform kernel methods in practice (including
neural tangent kernels). Most theoretical explanations of this performance gap focus on …

Understanding deflation process in over-parametrized tensor decomposition

R Ge, Y Ren, X Wang, M Zhou - Advances in Neural …, 2021 - proceedings.neurips.cc
In this paper we study the training dynamics for gradient flow on over-parametrized tensor
decomposition problems. Empirically, such training process often first fits larger components …

What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement.

N De La Vega, N Razin… - Advances in Neural …, 2024 - proceedings.neurips.cc
The question of what makes a data distribution suitable for deep learning is a fundamental
open problem. Focusing on locally connected neural networks (a prevalent family of …

Optimal gradient-based algorithms for non-concave bandit optimization

B Huang, K Huang, S Kakade, JD Lee… - Advances in …, 2021 - proceedings.neurips.cc
Bandit problems with linear or concave reward have been extensively studied, but relatively
few works have studied bandits with non-concave reward. This work considers a large family …

Behind the scenes of gradient descent: A trajectory analysis via basis function decomposition

J Ma, L Guo, S Fattahi - arXiv preprint arXiv:2210.00346, 2022 - arxiv.org
This work analyzes the solution trajectory of gradient-based algorithms via a novel basis
function decomposition. We show that, although solution trajectories of gradient-based …

Going beyond linear rl: Sample efficient neural function approximation

B Huang, K Huang, S Kakade, JD Lee… - Advances in …, 2021 - proceedings.neurips.cc
Abstract Deep Reinforcement Learning (RL) powered by neural net approximation of the Q
function has had enormous empirical success. While the theory of RL has traditionally …

Implicit regularization for group sparsity

J Li, TV Nguyen, C Hegde, RKW Wong - arXiv preprint arXiv:2301.12540, 2023 - arxiv.org
We study the implicit regularization of gradient descent towards structured sparsity via a
novel neural reparameterization, which we call a diagonally grouped linear neural network …

Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent

S Karnik, A Veselovska, M Iwen, F Krahmer - arXiv preprint arXiv …, 2024 - arxiv.org
We provide a rigorous analysis of implicit regularization in an overparametrized tensor
factorization problem beyond the lazy training regime. For matrix factorization problems, this …