One fits all: Power general time series analysis by pretrained lm

T Zhou, P Niu, L Sun, R Jin - Advances in neural …, 2023 - proceedings.neurips.cc
Although we have witnessed great success of pre-trained models in natural language
processing (NLP) and computer vision (CV), limited progress has been made for general …

The interpolation phase transition in neural networks: Memorization and generalization under lazy training

A Montanari, Y Zhong - The Annals of Statistics, 2022 - projecteuclid.org
The interpolation phase transition in neural networks: Memorization and generalization
under lazy training Page 1 The Annals of Statistics 2022, Vol. 50, No. 5, 2816–2847 https://doi.org/10.1214/22-AOS2211 …

Why robust generalization in deep learning is difficult: Perspective of expressive power

B Li, J Jin, H Zhong, J Hopcroft… - Advances in Neural …, 2022 - proceedings.neurips.cc
It is well-known that modern neural networks are vulnerable to adversarial examples. To
mitigate this problem, a series of robust learning algorithms have been proposed. However …

On the training and generalization of deep operator networks

S Lee, Y Shin - SIAM Journal on Scientific Computing, 2024 - SIAM
We present a novel training method for deep operator networks (DeepONets), one of the
most popular neural network models for operators. DeepONets are constructed by two …

Training Fully Connected Neural Networks is -Complete

D Bertschinger, C Hertrich… - Advances in …, 2024 - proceedings.neurips.cc
We consider the algorithmic problem of finding the optimal weights and biases for a two-
layer fully connected neural network to fit a given set of data points, also known as empirical …

Small transformers compute universal metric embeddings

A Kratsios, V Debarnot, I Dokmanić - Journal of Machine Learning …, 2023 - jmlr.org
We study representations of data from an arbitrary metric space χ in the space of univariate
Gaussian mixtures equipped with a transport metric (Delon and Desolneux 2020). We prove …

Minimum width for universal approximation using ReLU networks on compact domain

N Kim, C Min, S Park - arXiv preprint arXiv:2309.10402, 2023 - arxiv.org
The universal approximation property of width-bounded networks has been studied as a
dual of the classical universal approximation theorem for depth-bounded ones. There were …

Width is less important than depth in relu neural networks

G Vardi, G Yehudai, O Shamir - Conference on learning …, 2022 - proceedings.mlr.press
We solve an open question from Lu et al.(2017), by showing that any target network with
inputs in $\mathbb {R}^ d $ can be approximated by a width $ O (d) $ network (independent …

Provable memorization capacity of transformers

J Kim, M Kim, B Mozafari - The Eleventh International Conference …, 2023 - openreview.net
Quantifying memorization capacity is essential for understanding the expressiveness and
generalizability of deep learning model architectures. However, the memorization capacity …

One fits all: Universal time series analysis by pretrained lm and specially designed adaptors

T Zhou, P Niu, X Wang, L Sun, R Jin - arXiv preprint arXiv:2311.14782, 2023 - arxiv.org
Despite the impressive achievements of pre-trained models in the fields of natural language
processing (NLP) and computer vision (CV), progress in the domain of time series analysis …