Natural actor-critic for robust reinforcement learning with function approximation

R Zhou, T Liu, M Cheng, D Kalathil… - Advances in neural …, 2024 - proceedings.neurips.cc
We study robust reinforcement learning (RL) with the goal of determining a well-performing
policy that is robust against model mismatch between the training simulator and the testing …

Double pessimism is provably efficient for distributionally robust offline reinforcement learning: Generic algorithm and robust partial coverage

J Blanchet, M Lu, T Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc
We study distributionally robust offline reinforcement learning (RL), which seeks to find an
optimal robust policy purely from an offline dataset that can perform well in perturbed …

Learning cut selection for mixed-integer linear programming via hierarchical sequence model

Z Wang, X Li, J Wang, Y Kuang, M Yuan, J Zeng… - arXiv preprint arXiv …, 2023 - arxiv.org
Cutting planes (cuts) are important for solving mixed-integer linear programs (MILPs), which
formulate a wide range of important real-world applications. Cut selection--which aims to …

Adjustable robust reinforcement learning for online 3d bin packing

Y Pan, Y Chen, F Lin - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Designing effective policies for the online 3D bin packing problem (3D-BPP) has been a
long-standing challenge, primarily due to the unpredictable nature of incoming box …

Learning to stop cut generation for efficient mixed-integer linear programming

H Ling, Z Wang, J Wang - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Cutting planes (cuts) play an important role in solving mixed-integer linear programs
(MILPs), as they significantly tighten the dual bounds and improve the solving performance …

UAV air combat autonomous trajectory planning method based on robust adversarial reinforcement learning

L Wang, S Zheng, S Tai, H Liu, T Yue - Aerospace Science and Technology, 2024 - Elsevier
The poor robustness of the air combat autonomous trajectory planning strategy (ATP)
trained through vanilla reinforcement learning (RL) methods is attributed to its dependence …

Provable sim-to-real transfer in continuous domain with partial observations

J Hu, H Zhong, C Jin, L Wang - arXiv preprint arXiv:2210.15598, 2022 - arxiv.org
Sim-to-real transfer trains RL agents in the simulated environments and then deploys them
in the real world. Sim-to-real transfer has been widely used in practice because it is often …

Minimax optimal and computationally efficient algorithms for distributionally robust offline reinforcement learning

Z Liu, P Xu - arXiv preprint arXiv:2403.09621, 2024 - arxiv.org
Distributionally robust offline reinforcement learning (RL), which seeks robust policy training
against environment perturbation by modeling dynamics uncertainty, calls for function …

Optimal transport perturbations for safe reinforcement learning with robustness guarantees

J Queeney, EC Ozcan, IC Paschalidis… - arXiv preprint arXiv …, 2023 - arxiv.org
Robustness and safety are critical for the trustworthy deployment of deep reinforcement
learning in real-world decision making applications. In particular, we require algorithms that …

[HTML][HTML] Tube-based robust reinforcement learning for autonomous maneuver decision for UCAVs

W Lixin, S ZHENG, P Haiyin, LU Changqian… - Chinese Journal of …, 2024 - Elsevier
Reinforcement Learning (RL) algorithms enhance intelligence of air combat Autonomous
Maneuver Decision (AMD) policy, but they may underperform in target combat environments …