Neural networks can learn representations with gradient descent

A Damian, J Lee… - Conference on Learning …, 2022 - proceedings.mlr.press
Significant theoretical work has established that in specific regimes, neural networks trained
by gradient descent behave like kernel methods. However, in practice, it is known that …

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel

S Fort, GK Dziugaite, M Paul… - Advances in …, 2020 - proceedings.neurips.cc
In suitably initialized wide networks, small learning rates transform deep neural networks
(DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well …

Deep learning assisted microwave photonic dual-parameter sensing

X Tian, L Zhou, L Li, G Gunawan… - IEEE Journal of …, 2023 - ieeexplore.ieee.org
The combination of optical microresonators and the emerging microwave photonic (MWP)
sensing has recently drawn great attention, whereas its multi-parameter sensing capability …

Neural tangent kernel beyond the infinite-width limit: Effects of depth and initialization

M Seleznova, G Kutyniok - International Conference on …, 2022 - proceedings.mlr.press
Abstract Neural Tangent Kernel (NTK) is widely used to analyze overparametrized neural
networks due to the famous result by Jacot et al.(2018): in the infinite-width limit, the NTK is …

Bidirectional learning for offline model-based biological sequence design

C Chen, Y Zhang, X Liu… - … Conference on Machine …, 2023 - proceedings.mlr.press
Offline model-based optimization aims to maximize a black-box objective function with a
static dataset of designs and their scores. In this paper, we focus on biological sequence …

Deep networks and the multiple manifold problem

S Buchanan, D Gilboa, J Wright - arXiv preprint arXiv:2008.11245, 2020 - arxiv.org
We study the multiple manifold problem, a binary classification task modeled on applications
in machine vision, in which a deep fully-connected neural network is trained to separate two …

Towards understanding hierarchical learning: Benefits of neural representations

M Chen, Y Bai, JD Lee, T Zhao… - Advances in …, 2020 - proceedings.neurips.cc
Deep neural networks can empirically perform efficient hierarchical learning, in which the
layers learn useful representations of the data. However, how they make use of the …

Lqf: Linear quadratic fine-tuning

A Achille, A Golatkar, A Ravichandran… - Proceedings of the …, 2021 - openaccess.thecvf.com
Classifiers that are linear in their parameters, and trained by optimizing a convex loss
function, have predictable behavior with respect to changes in the training data, initial …

Understanding deflation process in over-parametrized tensor decomposition

R Ge, Y Ren, X Wang, M Zhou - Advances in Neural …, 2021 - proceedings.neurips.cc
In this paper we study the training dynamics for gradient flow on over-parametrized tensor
decomposition problems. Empirically, such training process often first fits larger components …

Efficient parametric approximations of neural network function space distance

N Dhawan, S Huang, J Bae… - … Conference on Machine …, 2023 - proceedings.mlr.press
It is often useful to compactly summarize important properties of model parameters and
training data so that they can be used later without storing and/or iterating over the entire …