Constraint-aware policy optimization to solve the vehicle routing problem with time windows

R Zhang, R Yu, W Xia - Information Technology and Control, 2022 - itc.ktu.lt
The vehicle routing problem with time windows (VRPTW) as one of the most known
combinatorial operations (CO) problem is considered to be a tough issue in practice and the …

Learning potential functions and their representations for multi-task reinforcement learning

M Snel, S Whiteson - Autonomous agents and multi-agent systems, 2014 - Springer
In multi-task learning, there are roughly two approaches to discovering representations. The
first is to discover task relevant representations, ie, those that compactly represent solutions …

Power Demand Reshaping Using Energy Storage for Distributed Edge Clouds

D Zheng, L Liu, G Tang, Y Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The booming edge computing market that is supported by the edge cloud (EC) infrastructure
has brought huge operating costs, mainly the energy cost, to edge service providers. The …

Monte-carlo tree search for policy optimization

X Ma, K Driggs-Campbell, Z Zhang… - arXiv preprint arXiv …, 2019 - arxiv.org
Gradient-based methods are often used for policy optimization in deep reinforcement
learning, despite being vulnerable to local optima and saddle points. Although gradient-free …

[PDF][PDF] Off-policy shaping ensembles in reinforcement learning

A Harutyunyan, T Brys, P Vrancx, A Nowé - ECAI 2014, 2014 - ebooks.iospress.nl
Off-Policy Shaping Ensembles in Reinforcement Learning Page 1 Off-Policy Shaping
Ensembles in Reinforcement Learning Anna Harutyunyan and Tim Brys and Peter Vrancx and …

[PDF][PDF] Using Incomplete and Incorrect Plans to Shape Reinforcement Learning in Long-Sequence Sparse-Reward Tasks

H Müller, D Kudenko - Proc. of the Adaptive and …, 2023 - alaworkshop2023.github.io
Reinforcement learning (RL) agents naturally struggle with longsequence sparse reward
tasks due to the lack of reward feedback during exploration and the problem of identifying …

Multi-agent credit assignment in stochastic resource management games

P Mannion, S Devlin, J Duggan… - The Knowledge …, 2017 - cambridge.org
Multi-agent systems (MASs) are a form of distributed intelligence, where multiple
autonomous agents act in a common environment. Numerous complex, real world systems …

Using Contrastive Samples for Identifying and Leveraging Possible Causal Relationships in Reinforcement Learning

H Khadilkar, H Meisheri - Proceedings of the 6th Joint International …, 2023 - dl.acm.org
A significant challenge in reinforcement learning is quantifying the complex relationship
between actions and long-term rewards. The effects may manifest themselves over a long …

A Reinforcement Learning Model for Virtual Machines Consolidation in Cloud Data Center

Q Chou, W Fan, J Zhang - 2021 6th international conference on …, 2021 - ieeexplore.ieee.org
Energy consumption in data center is currently the main focus of many large-scale
enterprises and cloud service providers. Dynamic virtual machine (VM) consolidation …

Potential-based reward shaping for knowledge-based, multi-agent reinforcement learning

SM Devlin - 2013 - etheses.whiterose.ac.uk
Reinforcement learning is a robust artificial intelligence solution for agents required to act in
an environment, making their own decisions on how to behave. Typically an agent is …