Off-policy fitted q-evaluation with differentiable function approximators: Z-estimation and inference theory

R Zhang, X Zhang, C Ni… - … Conference on Machine …, 2022 - proceedings.mlr.press
Abstract Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement
Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially …

Pessimistic nonlinear least-squares value iteration for offline reinforcement learning

Q Di, H Zhao, J He, Q Gu - arXiv preprint arXiv:2310.01380, 2023 - arxiv.org
Offline reinforcement learning (RL), where the agent aims to learn the optimal policy based
on the data collected by a behavior policy, has attracted increasing attention in recent years …

Offline reinforcement learning with differentiable function approximation is provably efficient

M Yin, M Wang, YX Wang - arXiv preprint arXiv:2210.00750, 2022 - arxiv.org
Offline reinforcement learning, which aims at optimizing sequential decision-making
strategies with historical data, has been extensively applied in real-life applications. State-Of …

Cooperative multi-agent reinforcement learning: Asynchronous communication and linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study multi-agent reinforcement learning in the setting of episodic Markov decision
processes, where many agents cooperate via communication through a central server. We …

Offline reinforcement learning with differential privacy

D Qiao, YX Wang - Advances in Neural Information …, 2024 - proceedings.neurips.cc
The offline reinforcement learning (RL) problem is often motivated by the need to learn data-
driven decision policies in financial, legal and healthcare applications. However, the learned …

Semiparametrically efficient off-policy evaluation in linear markov decision processes

C Xie, W Yang, Z Zhang - International Conference on …, 2023 - proceedings.mlr.press
We study semiparametrically efficient estimation in off-policy evaluation (OPE) where the
underlying Markov decision process (MDP) is linear with a known feature map. We …

Finding regularized competitive equilibria of heterogeneous agent macroeconomic models via reinforcement learning

R Xu, Y Min, T Wang, MI Jordan… - International …, 2023 - proceedings.mlr.press
We study a heterogeneous agent macroeconomic model with an infinite number of
households and firms competing in a labor market. Each household earns income and …

Robust offline policy evaluation and optimization with heavy-tailed rewards

J Zhu, R Wan, Z Qi, S Luo, C Shi - arXiv preprint arXiv:2310.18715, 2023 - arxiv.org
This paper endeavors to augment the robustness of offline reinforcement learning (RL) in
scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world …

Sample complexity of offline distributionally robust linear markov decision processes

H Wang, L Shi, Y Chi - arXiv preprint arXiv:2403.12946, 2024 - arxiv.org
In offline reinforcement learning (RL), the absence of active exploration calls for attention on
the model robustness to tackle the sim-to-real gap, where the discrepancy between the …

Regularization and variance-weighted regression achieves minimax optimality in linear MDPs: theory and practice

T Kitamura, T Kozuno, Y Tang… - International …, 2023 - proceedings.mlr.press
Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-
regularized reinforcement learning (RL), has served as the basis for recent high-performing …