Online adaptive policy selection in time-varying systems: No-regret via contractive perturbations

T Li, Y Lin, S Ren, A Wierman - Advances in Neural …, 2024 - proceedings.neurips.cc

We study the tradeoff between consistency and robustness in the context of a single-
trajectory time-varying Markov Decision Process (MDP) with untrusted machine-learned …

被引用次数：2 相关文章所有 6 个版本

[PDF] arxiv.org

Preparing for Black Swans: The Antifragility Imperative for Machine Learning

M Jin - arXiv preprint arXiv:2405.11397, 2024 - arxiv.org

Operating safely and reliably despite continual distribution shifts is vital for high-stakes
machine learning applications. This paper builds upon the transformative concept …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Online Policy Optimization in Unknown Nonlinear Systems

Y Lin, JA Preiss, F Xie, E Anand, SJ Chung… - arXiv preprint arXiv …, 2024 - arxiv.org

We study online policy optimization in nonlinear time-varying dynamical systems where the
true dynamical models are unknown to the controller. This problem is challenging because …

Online convex optimization for robust control of constrained dynamical systems

M Nonhoff, E Dall'Anese, MA Müller - arXiv preprint arXiv:2401.04487, 2024 - arxiv.org

This article investigates the problem of controlling linear time-invariant systems subject to
time-varying and a priori unknown cost functions, state and input constraints, and …

[PDF][PDF] Online Bandit Control with Dynamic Batch Length and Adaptive Learning Rate

J Kim, J Lavaei - lavaei.ieor.berkeley.edu

This paper is concerned with the online bandit control problem, which aims to learn the best
stabilizing controller from a pool of stabilizing and destabilizing controllers of unknown types …

高级搜索

QQ 群