Taylorized training: Towards better approximation of neural network training at finite width

A Damian, J Lee… - Conference on Learning …, 2022 - proceedings.mlr.press

Significant theoretical work has established that in specific regimes, neural networks trained
by gradient descent behave like kernel methods. However, in practice, it is known that …

被引用次数：133 相关文章所有 6 个版本

[PDF] neurips.cc

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel

S Fort, GK Dziugaite, M Paul… - Advances in …, 2020 - proceedings.neurips.cc

In suitably initialized wide networks, small learning rates transform deep neural networks
(DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well …

被引用次数：195 相关文章所有 6 个版本

[PDF] ieee.org

Deep learning assisted microwave photonic dual-parameter sensing

X Tian, L Zhou, L Li, G Gunawan… - IEEE Journal of …, 2023 - ieeexplore.ieee.org

The combination of optical microresonators and the emerging microwave photonic (MWP)
sensing has recently drawn great attention, whereas its multi-parameter sensing capability …

被引用次数：14 相关文章所有 2 个版本

[PDF] mlr.press

Neural tangent kernel beyond the infinite-width limit: Effects of depth and initialization

M Seleznova, G Kutyniok - International Conference on …, 2022 - proceedings.mlr.press

Abstract Neural Tangent Kernel (NTK) is widely used to analyze overparametrized neural
networks due to the famous result by Jacot et al.(2018): in the infinite-width limit, the NTK is …

被引用次数：30 相关文章所有 5 个版本

[PDF] mlr.press

Bidirectional learning for offline model-based biological sequence design

C Chen, Y Zhang, X Liu… - … Conference on Machine …, 2023 - proceedings.mlr.press

Offline model-based optimization aims to maximize a black-box objective function with a
static dataset of designs and their scores. In this paper, we focus on biological sequence …

被引用次数：18 相关文章所有 7 个版本

[PDF] arxiv.org

Deep networks and the multiple manifold problem

S Buchanan, D Gilboa, J Wright - arXiv preprint arXiv:2008.11245, 2020 - arxiv.org

We study the multiple manifold problem, a binary classification task modeled on applications
in machine vision, in which a deep fully-connected neural network is trained to separate two …

被引用次数：49 相关文章所有 11 个版本

[PDF] neurips.cc

Towards understanding hierarchical learning: Benefits of neural representations

M Chen, Y Bai, JD Lee, T Zhao… - Advances in …, 2020 - proceedings.neurips.cc

Deep neural networks can empirically perform efficient hierarchical learning, in which the
layers learn useful representations of the data. However, how they make use of the …

被引用次数：55 相关文章所有 8 个版本

[PDF] thecvf.com

Lqf: Linear quadratic fine-tuning

A Achille, A Golatkar, A Ravichandran… - Proceedings of the …, 2021 - openaccess.thecvf.com

Classifiers that are linear in their parameters, and trained by optimizing a convex loss
function, have predictable behavior with respect to changes in the training data, initial …

被引用次数：31 相关文章所有 8 个版本

[PDF] neurips.cc

Understanding deflation process in over-parametrized tensor decomposition

R Ge, Y Ren, X Wang, M Zhou - Advances in Neural …, 2021 - proceedings.neurips.cc

In this paper we study the training dynamics for gradient flow on over-parametrized tensor
decomposition problems. Empirically, such training process often first fits larger components …

被引用次数：23 相关文章所有 7 个版本

[PDF] mlr.press

Efficient parametric approximations of neural network function space distance

N Dhawan, S Huang, J Bae… - … Conference on Machine …, 2023 - proceedings.mlr.press

It is often useful to compactly summarize important properties of model parameters and
training data so that they can be used later without storing and/or iterating over the entire …

被引用次数：5 相关文章所有 6 个版本

高级搜索

QQ 群