A central capability of intelligent systems is the ability to continuously build upon previous experiences to speed up and enhance learning of new tasks. Two distinct research …
This work addresses decentralized online optimization in nonstationary environments. A network of agents aim to track the minimizer of a global, time-varying, and convex function …
We investigate online convex optimization in non-stationary environments and choose the dynamic regret as the performance measure, defined as the difference between cumulative …
M Zhang, P Zhao, H Luo… - … Conference on Machine …, 2022 - proceedings.mlr.press
Learning from repeated play in a fixed two-player zero-sum game is a classic problem in game theory and online learning. We consider a variant of this problem where the game …
Y Chen, CW Lee, H Luo… - Conference on Learning …, 2019 - proceedings.mlr.press
We propose the first contextual bandit algorithm that is parameter-free, efficient, and optimal in terms of dynamic regret. Specifically, our algorithm achieves $\mathcal {O}(\min\{\sqrt …
Y Bai, YJ Zhang, P Zhao… - Advances in Neural …, 2022 - proceedings.neurips.cc
The standard supervised learning paradigm works effectively when training data shares the same distribution as the upcoming testing samples. However, this stationary assumption is …
X Li, X Yi, L Xie - IEEE Transactions on Automatic Control, 2020 - ieeexplore.ieee.org
This article investigates the distributed online optimization problem over a multi-agent network subject to local set constraints and coupled inequality constraints, which has a lot of …
X Yi, X Li, T Yang, L Xie, T Chai… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Distributed bandit online convex optimization with time-varying coupled inequality constraints is considered, motivated by a repeated game between a group of learners and …
Recently, there has been a growing research interest in the analysis of dynamic regret, which measures the performance of an online learner against a sequence of local …