Width and depth limits commute in residual networks

S Hayou, G Yang - International Conference on Machine …, 2023 - proceedings.mlr.press
We show that taking the width and depth to infinity in a deep neural network with skip
connections, when branches are scaled by $1/\sqrt {depth} $, result in the same covariance …

On the infinite-depth limit of finite-width neural networks

S Hayou - Transactions on Machine Learning Research, 2022 - openreview.net
In this paper, we study the infinite-depth limit of finite-width residual neural networks with
random Gaussian weights. With proper scaling, we show that by fixing the width and taking …

Fast propagation is better: Accelerating single-step adversarial training via sampling subnetworks

X Jia, J Li, J Gu, Y Bai, X Cao - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Adversarial training has shown promise in building robust models against adversarial
examples. A major drawback of adversarial training is the computational overhead …

Continuously evolving dropout with multi-objective evolutionary optimisation

P Jiang, Y Xue, F Neri - Engineering Applications of Artificial Intelligence, 2023 - Elsevier
Dropout is an effective method of mitigating over-fitting while training deep neural networks
(DNNs). This method consists of switching off (dropping) some of the neurons of the DNN …

Exploring the robustness of decentralized training for large language models

L Lu, C Dai, W Tao, B Yuan, Y Sun, P Zhou - arXiv preprint arXiv …, 2023 - arxiv.org
Decentralized training of large language models has emerged as an effective way to
democratize this technology. However, the potential threats associated with this approach …

Karina: An efficient deep learning model for global weather forecast

M Cheon, YH Choi, SY Kang, Y Choi, JG Lee… - arXiv preprint arXiv …, 2024 - arxiv.org
Deep learning-based, data-driven models are gaining prevalence in climate research,
particularly for global weather prediction. However, training the global weather data at high …

Commutative Width and Depth Scaling in Deep Neural Networks

S Hayou - arXiv preprint arXiv:2310.01683, 2023 - arxiv.org
This paper is the second in the series Commutative Scaling of Width and Depth (WD) about
commutativity of infinite width and depth limits in deep neural networks. Our aim is to …

The curse of (non) convexity: The case of an Optimization-Inspired Data Pruning algorithm

F Ayed, S Hayou - I Can't Believe It's Not Better Workshop …, 2022 - openreview.net
Data pruning consists of identifying a subset of the training set that can be used for training
instead of the full dataset. This pruned dataset is often chosen to satisfy some desirable …

[PDF][PDF] The Equilibrium Hypothesis: Rethinking implicit regularization in Deep Neural Networks

Y Lou, C Mingard, S Hayou - stat, 2021 - academia.edu
Abstract Modern Deep Neural Networks (DNNs) exhibit impressive generalization properties
on a variety of tasks without explicit regularization, suggesting the existence of hidden …

Unveiling the Transport Dynamics of Neural Networks: a Least Action Principle for Deep Learning

AS Karkar - 2023 - theses.hal.science
Residual connections are ubiquitous in deep learning, since besides residual networks and
their variants, they are also present in Transformer architectures. The dynamic view of …