On the optimal memorization power of relu neural networks

T Zhou, P Niu, L Sun, R Jin - Advances in neural …, 2023 - proceedings.neurips.cc

Although we have witnessed great success of pre-trained models in natural language
processing (NLP) and computer vision (CV), limited progress has been made for general …

被引用次数：313 相关文章所有 7 个版本

[PDF] arxiv.org

The interpolation phase transition in neural networks: Memorization and generalization under lazy training

A Montanari, Y Zhong - The Annals of Statistics, 2022 - projecteuclid.org

The interpolation phase transition in neural networks: Memorization and generalization
under lazy training Page 1 The Annals of Statistics 2022, Vol. 50, No. 5, 2816–2847 https://doi.org/10.1214/22-AOS2211 …

被引用次数：108 相关文章所有 4 个版本

[PDF] neurips.cc

Why robust generalization in deep learning is difficult: Perspective of expressive power

B Li, J Jin, H Zhong, J Hopcroft… - Advances in Neural …, 2022 - proceedings.neurips.cc

It is well-known that modern neural networks are vulnerable to adversarial examples. To
mitigate this problem, a series of robust learning algorithms have been proposed. However …

被引用次数：34 相关文章所有 7 个版本

[PDF] arxiv.org

On the training and generalization of deep operator networks

S Lee, Y Shin - SIAM Journal on Scientific Computing, 2024 - SIAM

We present a novel training method for deep operator networks (DeepONets), one of the
most popular neural network models for operators. DeepONets are constructed by two …

被引用次数：15 相关文章所有 2 个版本

[PDF] neurips.cc

Training Fully Connected Neural Networks is -Complete

D Bertschinger, C Hertrich… - Advances in …, 2024 - proceedings.neurips.cc

We consider the algorithmic problem of finding the optimal weights and biases for a two-
layer fully connected neural network to fit a given set of data points, also known as empirical …

被引用次数：31 相关文章所有 9 个版本

[PDF] jmlr.org

Small transformers compute universal metric embeddings

A Kratsios, V Debarnot, I Dokmanić - Journal of Machine Learning …, 2023 - jmlr.org

We study representations of data from an arbitrary metric space χ in the space of univariate
Gaussian mixtures equipped with a transport metric (Delon and Desolneux 2020). We prove …

被引用次数：13 相关文章所有 4 个版本

[PDF] arxiv.org

Minimum width for universal approximation using ReLU networks on compact domain

N Kim, C Min, S Park - arXiv preprint arXiv:2309.10402, 2023 - arxiv.org

The universal approximation property of width-bounded networks has been studied as a
dual of the classical universal approximation theorem for depth-bounded ones. There were …

被引用次数：8 相关文章所有 5 个版本

[PDF] mlr.press

Width is less important than depth in relu neural networks

G Vardi, G Yehudai, O Shamir - Conference on learning …, 2022 - proceedings.mlr.press

We solve an open question from Lu et al.(2017), by showing that any target network with
inputs in $\mathbb {R}^ d $ can be approximated by a width $ O (d) $ network (independent …

被引用次数：17 相关文章所有 3 个版本

[PDF] openreview.net

Provable memorization capacity of transformers

J Kim, M Kim, B Mozafari - The Eleventh International Conference …, 2023 - openreview.net

Quantifying memorization capacity is essential for understanding the expressiveness and
generalizability of deep learning model architectures. However, the memorization capacity …

被引用次数：22 相关文章所有 2 个版本

[PDF] arxiv.org

One fits all: Universal time series analysis by pretrained lm and specially designed adaptors

T Zhou, P Niu, X Wang, L Sun, R Jin - arXiv preprint arXiv:2311.14782, 2023 - arxiv.org

Despite the impressive achievements of pre-trained models in the fields of natural language
processing (NLP) and computer vision (CV), progress in the domain of time series analysis …

被引用次数：7 相关文章所有 2 个版本

高级搜索

QQ 群