Onenet: Enhancing time series forecasting models under concept drift by online ensembling

Q Wen, W Chen, L Sun, Z Zhang… - Advances in …, 2023 - proceedings.neurips.cc
Online updating of time series forecasting models aims to address the concept drifting
problem by efficiently updating forecasting models based on streaming data. Many …

Kernel and rich regimes in overparametrized models

B Woodworth, S Gunasekar, JD Lee… - … on Learning Theory, 2020 - proceedings.mlr.press
A recent line of work studies overparametrized neural networks in the “kernel regime,” ie
when during training the network behaves as a kernelized linear predictor, and thus, training …

Saddle-to-saddle dynamics in diagonal linear networks

S Pesme, N Flammarion - Advances in Neural Information …, 2023 - proceedings.neurips.cc
In this paper we fully describe the trajectory of gradient flow over $2 $-layer diagonal linear
networks for the regression setting in the limit of vanishing initialisation. We show that the …

On the spectral bias of two-layer linear networks

AV Varre, ML Vladarean… - Advances in …, 2024 - proceedings.neurips.cc
This paper studies the behaviour of two-layer fully connected networks with linear
activations trained with gradient flow on the square loss. We show how the optimization …

(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …

Implicit bias of sgd for diagonal linear networks: a provable benefit of stochasticity

S Pesme, L Pillaud-Vivien… - Advances in Neural …, 2021 - proceedings.neurips.cc
Understanding the implicit bias of training algorithms is of crucial importance in order to
explain the success of overparametrised neural networks. In this paper, we study the …

Label noise (stochastic) gradient descent implicitly solves the lasso for quadratic parametrisation

LP Vivien, J Reygner… - Conference on Learning …, 2022 - proceedings.mlr.press
Understanding the implicit bias of training algorithms is of crucial importance in order to
explain the success of overparametrised neural networks. In this paper, we study the role of …

A novel framework for policy mirror descent with general parameterization and linear convergence

C Alfano, R Yuan, P Rebeschini - Advances in Neural …, 2023 - proceedings.neurips.cc
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …

(S) GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - arXiv preprint arXiv …, 2023 - arxiv.org
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over …

Leveraging continuous time to understand momentum when training diagonal linear networks

H Papazov, S Pesme… - … Conference on Artificial …, 2024 - proceedings.mlr.press
In this work, we investigate the effect of momentum on the optimisation trajectory of gradient
descent. We leverage a continuous-time approach in the analysis of momentum gradient …