Harnessing the power of infinitely wide deep nets on small-data tasks

V Fortuin - International Statistical Review, 2022 - Wiley Online Library

While the choice of prior is one of the most critical parts of the Bayesian inference workflow,
recent Bayesian deep learning models have often fallen back on vague priors, such as …

被引用次数：138 相关文章所有 8 个版本

[PDF] ncsu.edu

Optimization for deep learning: An overview

RY Sun - Journal of the Operations Research Society of China, 2020 - Springer

Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …

被引用次数：154 相关文章所有 7 个版本

[PDF] neurips.cc

Deep learning on a data diet: Finding important examples early in training

M Paul, S Ganguli… - Advances in neural …, 2021 - proceedings.neurips.cc

Recent success in deep learning has partially been driven by training increasingly
overparametrized networks on ever larger datasets. It is therefore natural to ask: how much …

被引用次数：373 相关文章所有 9 个版本

[PDF] neurips.cc

Dataset distillation with infinitely wide convolutional networks

T Nguyen, R Novak, L Xiao… - Advances in Neural …, 2021 - proceedings.neurips.cc

The effectiveness of machine learning algorithms arises from being able to extract useful
features from large amounts of data. As model and dataset sizes increase, dataset …

被引用次数：237 相关文章所有 8 个版本

[PDF] arxiv.org

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

M Belkin - Acta Numerica, 2021 - cambridge.org

In the past decade the mathematical theory of machine learning has lagged far behind the
triumphs of deep neural networks on practical challenges. However, the gap between theory …

被引用次数：243 相关文章所有 6 个版本

[PDF] arxiv.org

How neural networks extrapolate: From feedforward to graph neural networks

K Xu, M Zhang, J Li, SS Du, K Kawarabayashi… - arXiv preprint arXiv …, 2020 - arxiv.org

We study how neural networks trained by gradient descent extrapolate, ie, what they learn
outside the support of the training distribution. Previous works report mixed empirical results …

被引用次数：348 相关文章所有 6 个版本

[PDF] neurips.cc

Task arithmetic in the tangent space: Improved editing of pre-trained models

G Ortiz-Jimenez, A Favero… - Advances in Neural …, 2024 - proceedings.neurips.cc

Task arithmetic has recently emerged as a cost-effective and scalable approach to edit pre-
trained models directly in weight space: By adding the fine-tuned weights of different tasks …

被引用次数：71 相关文章所有 6 个版本

[PDF] arxiv.org

Differentially private learning needs better features (or much more data)

F Tramer, D Boneh - arXiv preprint arXiv:2011.11660, 2020 - arxiv.org

We demonstrate that differentially private machine learning has not yet reached its" AlexNet
moment" on many canonical vision tasks: linear models trained on handcrafted features …

被引用次数：259 相关文章所有 3 个版本

[PDF] neurips.cc

Finite versus infinite neural networks: an empirical study

J Lee, S Schoenholz, J Pennington… - Advances in …, 2020 - proceedings.neurips.cc

We perform a careful, thorough, and large scale empirical study of the correspondence
between wide neural networks and kernel methods. By doing so, we resolve a variety of …

被引用次数：216 相关文章所有 8 个版本

[PDF] mlr.press

A kernel-based view of language model fine-tuning

S Malladi, A Wettig, D Yu, D Chen… - … on Machine Learning, 2023 - proceedings.mlr.press

It has become standard to solve NLP tasks by fine-tuning pre-trained language models
(LMs), especially in low-data settings. There is minimal theoretical understanding of …

被引用次数：51 相关文章所有 9 个版本

高级搜索

QQ 群