Regularization in resnet with stochastic depth

S Hayou, G Yang - International Conference on Machine …, 2023 - proceedings.mlr.press

We show that taking the width and depth to infinity in a deep neural network with skip
connections, when branches are scaled by $1/\sqrt {depth} $, result in the same covariance …

被引用次数：10 相关文章所有 6 个版本

[PDF] openreview.net

On the infinite-depth limit of finite-width neural networks

S Hayou - Transactions on Machine Learning Research, 2022 - openreview.net

In this paper, we study the infinite-depth limit of finite-width residual neural networks with
random Gaussian weights. With proper scaling, we show that by fixing the width and taking …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Fast propagation is better: Accelerating single-step adversarial training via sampling subnetworks

X Jia, J Li, J Gu, Y Bai, X Cao - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Adversarial training has shown promise in building robust models against adversarial
examples. A major drawback of adversarial training is the computational overhead …

被引用次数：2 相关文章所有 3 个版本

Continuously evolving dropout with multi-objective evolutionary optimisation

P Jiang, Y Xue, F Neri - Engineering Applications of Artificial Intelligence, 2023 - Elsevier

Dropout is an effective method of mitigating over-fitting while training deep neural networks
(DNNs). This method consists of switching off (dropping) some of the neurons of the DNN …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Exploring the robustness of decentralized training for large language models

L Lu, C Dai, W Tao, B Yuan, Y Sun, P Zhou - arXiv preprint arXiv …, 2023 - arxiv.org

Decentralized training of large language models has emerged as an effective way to
democratize this technology. However, the potential threats associated with this approach …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Karina: An efficient deep learning model for global weather forecast

M Cheon, YH Choi, SY Kang, Y Choi, JG Lee… - arXiv preprint arXiv …, 2024 - arxiv.org

Deep learning-based, data-driven models are gaining prevalence in climate research,
particularly for global weather prediction. However, training the global weather data at high …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Commutative Width and Depth Scaling in Deep Neural Networks

S Hayou - arXiv preprint arXiv:2310.01683, 2023 - arxiv.org

This paper is the second in the series Commutative Scaling of Width and Depth (WD) about
commutativity of infinite width and depth limits in deep neural networks. Our aim is to …

The curse of (non) convexity: The case of an Optimization-Inspired Data Pruning algorithm

F Ayed, S Hayou - I Can't Believe It's Not Better Workshop …, 2022 - openreview.net

Data pruning consists of identifying a subset of the training set that can be used for training
instead of the full dataset. This pruned dataset is often chosen to satisfy some desirable …

被引用次数：1 相关文章

[PDF] academia.edu

[PDF][PDF] The Equilibrium Hypothesis: Rethinking implicit regularization in Deep Neural Networks

Y Lou, C Mingard, S Hayou - stat, 2021 - academia.edu

Abstract Modern Deep Neural Networks (DNNs) exhibit impressive generalization properties
on a variety of tasks without explicit regularization, suggesting the existence of hidden …

Unveiling the Transport Dynamics of Neural Networks: a Least Action Principle for Deep Learning

AS Karkar - 2023 - theses.hal.science

Residual connections are ubiquitous in deep learning, since besides residual networks and
their variants, they are also present in Transformer architectures. The dynamic view of …

高级搜索

QQ 群