Beyond black-box advice: learning-augmented algorithms for MDPs with Q-value predictions

T Li, Y Lin, S Ren, A Wierman - Advances in Neural …, 2024 - proceedings.neurips.cc
We study the tradeoff between consistency and robustness in the context of a single-
trajectory time-varying Markov Decision Process (MDP) with untrusted machine-learned …

Preparing for Black Swans: The Antifragility Imperative for Machine Learning

M Jin - arXiv preprint arXiv:2405.11397, 2024 - arxiv.org
Operating safely and reliably despite continual distribution shifts is vital for high-stakes
machine learning applications. This paper builds upon the transformative concept …

Online Policy Optimization in Unknown Nonlinear Systems

Y Lin, JA Preiss, F Xie, E Anand, SJ Chung… - arXiv preprint arXiv …, 2024 - arxiv.org
We study online policy optimization in nonlinear time-varying dynamical systems where the
true dynamical models are unknown to the controller. This problem is challenging because …

Online convex optimization for robust control of constrained dynamical systems

M Nonhoff, E Dall'Anese, MA Müller - arXiv preprint arXiv:2401.04487, 2024 - arxiv.org
This article investigates the problem of controlling linear time-invariant systems subject to
time-varying and a priori unknown cost functions, state and input constraints, and …

[PDF][PDF] Online Bandit Control with Dynamic Batch Length and Adaptive Learning Rate

J Kim, J Lavaei - lavaei.ieor.berkeley.edu
This paper is concerned with the online bandit control problem, which aims to learn the best
stabilizing controller from a pool of stabilizing and destabilizing controllers of unknown types …