Quantitative clts in deep neural networks

S Favaro, B Hanin, D Marinucci, I Nourdin… - arXiv preprint arXiv …, 2023 - arxiv.org
We study the distribution of a fully connected neural network with random Gaussian weights
and biases in which the hidden layer widths are proportional to a large constant $ n $. Under …

Quantitative Gaussian approximation of randomly initialized deep neural networks

A Basteri, D Trevisan - Machine Learning, 2024 - Springer
Given any deep fully connected neural network, initialized with random Gaussian
parameters, we bound from above the quadratic Wasserstein distance between its output …

Non-asymptotic approximations of neural networks by Gaussian processes

R Eldan, D Mikulincer… - Conference on Learning …, 2021 - proceedings.mlr.press
We study the extent to which wide neural networks may be approximated by Gaussian
processes, when initialized with random weights. It is a well-established fact that as the …

Fundamental limits of overparametrized shallow neural networks for supervised learning

F Camilli, D Tieplova, J Barbier - arXiv preprint arXiv:2307.05635, 2023 - arxiv.org
We carry out an information-theoretical analysis of a two-layer neural network trained from
input-output pairs generated by a teacher network with matching architecture, in …

Neural network gaussian processes by increasing depth

SQ Zhang, F Wang, FL Fan - IEEE Transactions on Neural …, 2022 - ieeexplore.ieee.org
Recent years have witnessed an increasing interest in the correspondence between
infinitely wide networks and Gaussian processes. Despite the effectiveness and elegance of …

[PDF][PDF] Non-asymptotic approximations of Gaussian neural networks via second-order Poincaré inequalities

A Bordino, S Favaro, S Fortini - Proceedings of Machine …, 2024 - iris.unibocconi.it
There is a recent and growing literature on large-width asymptotic and non-asymptotic
properties of deep Gaussian neural networks (NNs), namely NNs with weights initialized as …

Over-parameterised shallow neural networks with asymmetrical node scaling: global convergence guarantees and feature learning

F Caron, F Ayed, P Jung, H Lee, J Lee… - arXiv preprint arXiv …, 2023 - arxiv.org
We consider the optimisation of large and shallow neural networks via gradient flow, where
the output of each hidden node is scaled by some positive parameter. We focus on the case …

Wide Deep Neural Networks with Gaussian Weights are Very Close to Gaussian Processes

D Trevisan - arXiv preprint arXiv:2312.11737, 2023 - arxiv.org
We establish novel rates for the Gaussian approximation of random deep neural networks
with Gaussian parameters (weights and biases) and Lipschitz activation functions, in the …

A generalized neural tangent kernel for surrogate gradient learning

L Eilers, RM Memmesheimer, S Goedeke - arXiv preprint arXiv …, 2024 - arxiv.org
State-of-the-art neural network training methods depend on the gradient of the network
function. Therefore, they cannot be applied to networks whose activation functions do not …

Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection

S Adams, M Lahijanian, L Laurenti - arXiv preprint arXiv:2407.18707, 2024 - arxiv.org
Infinitely wide or deep neural networks (NNs) with independent and identically distributed
(iid) parameters have been shown to be equivalent to Gaussian processes. Because of the …