A recent line of work studies overparametrized neural networks in the “kernel regime,” ie when during training the network behaves as a kernelized linear predictor, and thus, training …
S Pesme, N Flammarion - Advances in Neural Information …, 2023 - proceedings.neurips.cc
In this paper we fully describe the trajectory of gradient flow over $2 $-layer diagonal linear networks for the regression setting in the limit of vanishing initialisation. We show that the …
This paper studies the behaviour of two-layer fully connected networks with linear activations trained with gradient flow on the square loss. We show how the optimization …
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …
Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the …
Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the role of …
C Alfano, R Yuan, P Rebeschini - Advances in Neural …, 2023 - proceedings.neurips.cc
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe their success to the use of parameterized policies. However, while theoretical guarantees …
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over …
H Papazov, S Pesme… - … Conference on Artificial …, 2024 - proceedings.mlr.press
In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient …