Adapting to online label shift with provable guarantees

Y Bai, YJ Zhang, P Zhao… - Advances in Neural …, 2022 - proceedings.neurips.cc
The standard supervised learning paradigm works effectively when training data shares the
same distribution as the upcoming testing samples. However, this stationary assumption is …

Efficient and optimal algorithms for contextual dueling bandits under realizability

A Saha, A Krishnamurthy - International Conference on …, 2022 - proceedings.mlr.press
We study the $ K $-armed contextual dueling bandit problem, a sequential decision making
setting in which the learner uses contextual information to make two decisions, but only …

Efficient methods for non-stationary online learning

P Zhao, YF Xie, L Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc
Non-stationary online learning has drawn much attention in recent years. In particular,\emph
{dynamic regret} and\emph {adaptive regret} are proposed as two principled performance …

Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization

P Zhao, YJ Zhang, L Zhang, ZH Zhou - Journal of Machine Learning …, 2024 - jmlr.org
We investigate online convex optimization in non-stationary environments and choose
dynamic regret as the performance measure, defined as the difference between cumulative …

Optimal dynamic regret in proper online learning with strongly convex losses and beyond

D Baby, YX Wang - International Conference on Artificial …, 2022 - proceedings.mlr.press
We study the framework of universal dynamic regret minimization with strongly convex
losses. We answer an open problem in Baby and Wang 2021 by showing that in a proper …

Dynamic regret of adversarial linear mixture MDPs

LF Li, P Zhao, ZH Zhou - Advances in Neural Information …, 2024 - proceedings.neurips.cc
We study reinforcement learning in episodic inhomogeneous MDPs with adversarial full-
information rewards and the unknown transition kernel. We consider the linear mixture …

Non-stationary online learning with memory and non-stochastic control

P Zhao, YH Yan, YX Wang, ZH Zhou - The Journal of Machine Learning …, 2023 - dl.acm.org
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss
functions to depend on past decisions and thus captures temporal effects of learning …

Adapting to continuous covariate shift via online density ratio estimation

YJ Zhang, ZY Zhang, P Zhao… - Advances in Neural …, 2024 - proceedings.neurips.cc
Dealing with distribution shifts is one of the central challenges for modern machine learning.
One fundamental situation is the covariate shift, where the input distributions of data change …

Dynamic regret of online markov decision processes

P Zhao, LF Li, ZH Zhou - International Conference on …, 2022 - proceedings.mlr.press
Abstract We investigate online Markov Decision Processes (MDPs) with adversarially
changing loss functions and known transitions. We choose dynamic regret as the …

Unconstrained dynamic regret via sparse coding

Z Zhang, A Cutkosky… - Advances in Neural …, 2024 - proceedings.neurips.cc
Motivated by the challenge of nonstationarity in sequential decision making, we study Online
Convex Optimization (OCO) under the coupling of two problem structures: the domain is …