Optimal dynamic regret in exp-concave online learning

Y Bai, YJ Zhang, P Zhao… - Advances in Neural …, 2022 - proceedings.neurips.cc

The standard supervised learning paradigm works effectively when training data shares the
same distribution as the upcoming testing samples. However, this stationary assumption is …

被引用次数：24 相关文章所有 9 个版本

[PDF] mlr.press

Efficient and optimal algorithms for contextual dueling bandits under realizability

A Saha, A Krishnamurthy - International Conference on …, 2022 - proceedings.mlr.press

We study the $ K $-armed contextual dueling bandit problem, a sequential decision making
setting in which the learner uses contextual information to make two decisions, but only …

被引用次数：35 相关文章所有 3 个版本

[PDF] neurips.cc

Efficient methods for non-stationary online learning

P Zhao, YF Xie, L Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc

Non-stationary online learning has drawn much attention in recent years. In particular,\emph
{dynamic regret} and\emph {adaptive regret} are proposed as two principled performance …

被引用次数：17 相关文章所有 12 个版本

[PDF] jmlr.org

Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization

P Zhao, YJ Zhang, L Zhang, ZH Zhou - Journal of Machine Learning …, 2024 - jmlr.org

We investigate online convex optimization in non-stationary environments and choose
dynamic regret as the performance measure, defined as the difference between cumulative …

被引用次数：34 相关文章所有 5 个版本

[PDF] mlr.press

Optimal dynamic regret in proper online learning with strongly convex losses and beyond

D Baby, YX Wang - International Conference on Artificial …, 2022 - proceedings.mlr.press

We study the framework of universal dynamic regret minimization with strongly convex
losses. We answer an open problem in Baby and Wang 2021 by showing that in a proper …

被引用次数：30 相关文章所有 4 个版本

[PDF] neurips.cc

Dynamic regret of adversarial linear mixture MDPs

LF Li, P Zhao, ZH Zhou - Advances in Neural Information …, 2024 - proceedings.neurips.cc

We study reinforcement learning in episodic inhomogeneous MDPs with adversarial full-
information rewards and the unknown transition kernel. We consider the linear mixture …

被引用次数：4 相关文章所有 5 个版本

[PDF] jmlr.org

Non-stationary online learning with memory and non-stochastic control

P Zhao, YH Yan, YX Wang, ZH Zhou - The Journal of Machine Learning …, 2023 - dl.acm.org

We study the problem of Online Convex Optimization (OCO) with memory, which allows loss
functions to depend on past decisions and thus captures temporal effects of learning …

被引用次数：45 相关文章所有 8 个版本

[PDF] neurips.cc

Adapting to continuous covariate shift via online density ratio estimation

YJ Zhang, ZY Zhang, P Zhao… - Advances in Neural …, 2024 - proceedings.neurips.cc

Dealing with distribution shifts is one of the central challenges for modern machine learning.
One fundamental situation is the covariate shift, where the input distributions of data change …

被引用次数：10 相关文章所有 8 个版本

[PDF] mlr.press

Dynamic regret of online markov decision processes

P Zhao, LF Li, ZH Zhou - International Conference on …, 2022 - proceedings.mlr.press

Abstract We investigate online Markov Decision Processes (MDPs) with adversarially
changing loss functions and known transitions. We choose dynamic regret as the …

被引用次数：16 相关文章所有 9 个版本

[PDF] neurips.cc

Unconstrained dynamic regret via sparse coding

Z Zhang, A Cutkosky… - Advances in Neural …, 2024 - proceedings.neurips.cc

Motivated by the challenge of nonstationarity in sequential decision making, we study Online
Convex Optimization (OCO) under the coupling of two problem structures: the domain is …

被引用次数：8 相关文章所有 7 个版本

高级搜索

QQ 群