Loss of plasticity in continual deep reinforcement learning

Z Abbas, R Zhao, J Modayil, A White… - … on Lifelong Learning …, 2023 - proceedings.mlr.press
In this paper, we characterize the behavior of canonical value-based deep reinforcement
learning (RL) approaches under varying degrees of non-stationarity. In particular, we …

[HTML][HTML] An analysis of multi-agent reinforcement learning for decentralized inventory control systems

M Mousa, D van de Berg, N Kotecha… - Computers & Chemical …, 2024 - Elsevier
Most solutions to the inventory management problem assume a centralization of information
that is incompatible with organizational constraints in supply chain networks. The problem …

Policy optimization for continuous reinforcement learning

H Zhao, W Tang, D Yao - Advances in Neural Information …, 2024 - proceedings.neurips.cc
We study reinforcement learning (RL) in the setting of continuous time and space, for an
infinite horizon with a discounted objective and the underlying dynamics driven by a …

Hindsight learning for mdps with exogenous inputs

SR Sinclair, FV Frujeri, CA Cheng… - International …, 2023 - proceedings.mlr.press
Many resource management problems require sequential decision-making under
uncertainty, where the only uncertainty affecting the decision outcomes are exogenous …

Algorithmic and human collusion

T Werner - Available at SSRN 3960738, 2024 - papers.ssrn.com
I study self-learning pricing algorithms and show that they are collusive in market
simulations. To derive a counterfactual that resembles traditional tacit collusion, I conduct …

Deep reinforcement learning for continuous wood drying production line control

FA Tremblay, A Durand, M Morin, P Marier… - Computers in …, 2024 - Elsevier
Continuous high-frequency wood drying, when integrated with a traditional wood finishing
line, allows correcting moisture content one piece of lumber at a time in order to improve its …

Model-based reinforcement learning with scalable composite policy gradient estimators

P Parmas, T Seno, Y Aoki - International Conference on …, 2023 - proceedings.mlr.press
In model-based reinforcement learning (MBRL), policy gradients can be estimated either by
derivative-free RL methods, such as likelihood ratio gradients (LR), or by backpropagating …

Deep neural newsvendor

J Han, M Hu, G Shen - arXiv preprint arXiv:2309.13830, 2023 - arxiv.org
We consider a data-driven newsvendor problem, where one has access to past demand
data and the associated feature information. We solve the problem by estimating the target …

Neural inventory control in networks via hindsight differentiable policy optimization

M Alvo, D Russo, Y Kanoria - arXiv preprint arXiv:2306.11246, 2023 - arxiv.org
Inventory management offers unique opportunities for reliably evaluating and applying deep
reinforcement learning (DRL). Rather than evaluate DRL algorithms by comparing against …

Pivoting Retail Supply Chain with Deep Generative Techniques: Taxonomy, Survey and Insights

Y Wang, LK Sambasivan, M Fu, P Mehrotra - arXiv preprint arXiv …, 2024 - arxiv.org
Generative AI applications, such as ChatGPT or DALL-E, have shown the world their
impressive capabilities in generating human-like text or image. Diving deeper, the science …