Exponentiated gradient meets gradient descent

Q Wen, W Chen, L Sun, Z Zhang… - Advances in …, 2023 - proceedings.neurips.cc

Online updating of time series forecasting models aims to address the concept drifting
problem by efficiently updating forecasting models based on streaming data. Many …

被引用次数：35 相关文章所有 5 个版本

[PDF] mlr.press

Kernel and rich regimes in overparametrized models

B Woodworth, S Gunasekar, JD Lee… - … on Learning Theory, 2020 - proceedings.mlr.press

A recent line of work studies overparametrized neural networks in the “kernel regime,” ie
when during training the network behaves as a kernelized linear predictor, and thus, training …

被引用次数：411 相关文章所有 11 个版本

[PDF] neurips.cc

Saddle-to-saddle dynamics in diagonal linear networks

S Pesme, N Flammarion - Advances in Neural Information …, 2023 - proceedings.neurips.cc

In this paper we fully describe the trajectory of gradient flow over $2 $-layer diagonal linear
networks for the regression setting in the limit of vanishing initialisation. We show that the …

被引用次数：35 相关文章所有 8 个版本

[PDF] neurips.cc

On the spectral bias of two-layer linear networks

AV Varre, ML Vladarean… - Advances in …, 2024 - proceedings.neurips.cc

This paper studies the behaviour of two-layer fully connected networks with linear
activations trained with gradient flow on the square loss. We show how the optimization …

被引用次数：11 相关文章所有 4 个版本

[PDF] neurips.cc

(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - Advances in Neural …, 2023 - proceedings.neurips.cc

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …

被引用次数：14 相关文章所有 3 个版本

[PDF] neurips.cc

Implicit bias of sgd for diagonal linear networks: a provable benefit of stochasticity

S Pesme, L Pillaud-Vivien… - Advances in Neural …, 2021 - proceedings.neurips.cc

Understanding the implicit bias of training algorithms is of crucial importance in order to
explain the success of overparametrised neural networks. In this paper, we study the …

被引用次数：118 相关文章所有 9 个版本

[PDF] mlr.press

Label noise (stochastic) gradient descent implicitly solves the lasso for quadratic parametrisation

LP Vivien, J Reygner… - Conference on Learning …, 2022 - proceedings.mlr.press

Understanding the implicit bias of training algorithms is of crucial importance in order to
explain the success of overparametrised neural networks. In this paper, we study the role of …

被引用次数：41 相关文章所有 5 个版本

[PDF] neurips.cc

A novel framework for policy mirror descent with general parameterization and linear convergence

C Alfano, R Yuan, P Rebeschini - Advances in Neural …, 2023 - proceedings.neurips.cc

Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …

被引用次数：20 相关文章所有 8 个版本

[PDF] arxiv.org

(S) GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - arXiv preprint arXiv …, 2023 - arxiv.org

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over …

被引用次数：33 相关文章所有 7 个版本

[PDF] mlr.press

Leveraging continuous time to understand momentum when training diagonal linear networks

H Papazov, S Pesme… - … Conference on Artificial …, 2024 - proceedings.mlr.press

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient
descent. We leverage a continuous-time approach in the analysis of momentum gradient …

被引用次数：7 相关文章所有 6 个版本

高级搜索

QQ 群