Mastering diverse domains through world models

D Hafner, J Pasukonis, J Ba, T Lillicrap - arXiv preprint arXiv:2301.04104, 2023 - arxiv.org
Developing a general algorithm that learns to solve tasks across a wide range of
applications has been a fundamental challenge in artificial intelligence. Although current …

Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems

M Giegrich, C Reisinger, Y Zhang - SIAM Journal on Control and Optimization, 2024 - SIAM
We study the global linear convergence of policy gradient (PG) methods for finite-horizon
continuous-time exploratory linear-quadratic control (LQC) problems. The setting includes …

Examining Responsibility and Deliberation in AI Impact Statements and Ethics Reviews

D Liu, P Nanayakkara, SA Sakha… - Proceedings of the …, 2022 - dl.acm.org
The artificial intelligence research community is continuing to grapple with the ethics of its
work by encouraging researchers to discuss potential positive and negative consequences …

Reinforcement Learning for Jump-Diffusions

X Gao, L Li, XY Zhou - arXiv preprint arXiv:2405.16449, 2024 - arxiv.org
We study continuous-time reinforcement learning (RL) for stochastic control in which system
dynamics are governed by jump-diffusion processes. We formulate an entropy-regularized …

Reinforcement Learning with Elastic Time Steps

D Wang, G Beltrame - arXiv preprint arXiv:2402.14961, 2024 - arxiv.org
Traditional Reinforcement Learning (RL) algorithms are usually applied in robotics to learn
controllers that act with a fixed control rate. Given the discrete nature of RL algorithms, they …

Learning Uncertainty-Aware Temporally-Extended Actions

J Lee, SJ Park, Y Tang, M Oh - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
In reinforcement learning, temporal abstraction in the action space is a common approach to
simplifying the learning process of policies through temporally-extended courses of action …

Managing temporal resolution in continuous value estimation: A fundamental trade-off

ZV Zhang, J Kirschner, J Zhang… - Advances in …, 2024 - proceedings.neurips.cc
A default assumption in reinforcement learning (RL) and optimal control is that observations
arrive at discrete time points on a fixed clock cycle. Yet, many applications involve …

Dynamic Decision Frequency with Continuous Options

A Karimi, J Jin, J Luo, AR Mahmood… - 2023 IEEE/RSJ …, 2023 - ieeexplore.ieee.org
In classic reinforcement learning algorithms, agents make decisions at discrete and fixed
time intervals. The duration between decisions becomes a crucial hyperparameter, as …

Sublinear regret for an actor-critic algorithm in continuous-time linear-quadratic reinforcement learning

Y Huang, Y Jia, XY Zhou - Available at SSRN 4904358, 2024 - papers.ssrn.com
We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ)
control problems for diffusions where volatility of the state processes depends on both state …

Simultaneously updating all persistence values in reinforcement learning

L Sabbioni, L Al Daire, L Bisi, AM Metelli… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Abstract In Reinforcement Learning, the performance of learning agents is highly sensitive to
the choice of time discretization. Agents acting at high frequencies have the best control …