What about inputting policy in value function: Policy representation and policy-extended...

Data-driven hospitals staff and resources allocation using agent-based simulation and deep reinforcement learning

T Lazebnik - Engineering Applications of Artificial Intelligence, 2023 - Elsevier

Hospital staff and resources allocation (HSRA) is a critical challenge in healthcare systems,
as it involves balancing the demands of patients, the availability of resources, and the need …

被引用次数：21 相关文章所有 2 个版本网页快照

[PDF] sci-hub [PDF] arxiv.org [ 下载加速 ]

Learning useful representations of recurrent neural network weight matrices

V Herrmann, F Faccio, J Schmidhuber - arXiv preprint arXiv:2403.11998, 2024 - arxiv.org

Recurrent Neural Networks (RNNs) are general-purpose parallel-sequential computers. The
program of an RNN is its weight matrix. How to learn useful representations of RNN weights …

被引用次数：6 相关文章所有 6 个版本网页快照

[PDF] sci-hub

Toward complete coverage planning using deep reinforcement learning by trapezoid-based transformable robot

DT Vo, AV Le, TD Ta, M Tran, P Van Duc, MB Vu… - … Applications of Artificial …, 2023 - Elsevier

Shape-shifting robots are the feasible solutions to solve the Complete Coverage Planning
(CCP) problem. These robots can extend the covered areas by reconfiguring their shape to …

被引用次数：8 相关文章所有 2 个版本网页快照

[PDF] sci-hub [PDF] arxiv.org [ 下载加速 ]

Improving deep reinforcement learning by reducing the chain effect of value and policy churn

H Tang, G Berseth - arXiv preprint arXiv:2409.04792, 2024 - arxiv.org

Deep neural networks provide Reinforcement Learning (RL) powerful function
approximators to address large-scale decision-making problems. However, these …

被引用次数：2 相关文章所有 3 个版本网页快照

[PDF] sci-hub [PDF] arxiv.org [ 下载加速 ]

General policy evaluation and improvement by learning to identify few but crucial states

F Faccio, A Ramesh, V Herrmann, J Harb… - arXiv preprint arXiv …, 2022 - arxiv.org

Learning to evaluate and improve policies is a core problem of Reinforcement Learning
(RL). Traditional RL algorithms learn a value function defined for a single policy. A recently …

被引用次数：10 相关文章所有 5 个版本网页快照

[PDF] sci-hub [PDF] mlr.press [ 下载加速 ]

Representation-driven reinforcement learning

O Nabati, G Tennenholtz… - … Conference on Machine …, 2023 - proceedings.mlr.press

We present a representation-driven framework for reinforcement learning. By representing
policies as estimates of their expected values, we leverage techniques from contextual …

被引用次数：2 相关文章所有 7 个版本网页快照

[PDF] sci-hub [PDF] arxiv.org [ 下载加速 ]

Self-supervised Pretraining for Decision Foundation Model: Formulation, Pipeline and Challenges

X Liu, J Jiao, J Zhang - arXiv preprint arXiv:2401.00031, 2023 - arxiv.org

Decision-making is a dynamic process requiring perception, memory, and reasoning to
make choices and find optimal policies. Traditional approaches to decision-making suffer …

被引用次数：1 相关文章所有 2 个版本网页快照

[PDF] sci-hub [PDF] ieee.org [ 下载加速 ]

Adaptive Optimization in Evolutionary Reinforcement Learning Using Evolutionary Mutation Rates

Y Zhao, Y Ding, Y Pei - IEEE Access, 2024 - ieeexplore.ieee.org

Deep reinforcement learning (DRL) has achieved notable success in continuous control
tasks. However, it faces challenges that limit its applicability to a wider array of tasks …

被引用次数：1 相关文章所有 2 个版本网页快照

[PDF] sci-hub [PDF] arxiv.org [ 下载加速 ]

Pandr: Fast adaptation to new environments from offline experiences via decoupling policy and environment representations

T Sang, H Tang, Y Ma, J Hao, Y Zheng, Z Meng… - arXiv preprint arXiv …, 2022 - arxiv.org

Deep Reinforcement Learning (DRL) has been a promising solution to many complex
decision-making problems. Nevertheless, the notorious weakness in generalization among …

被引用次数：7 相关文章所有 4 个版本网页快照

[PDF] sci-hub [PDF] ijcai.org [ 下载加速 ]

[PDF][PDF] PoRank: A Practical Framework for Learning to Rank Policies

P Gu, M Zhao, X He, Y Cai, B An - … of the Thirty-Third International Joint …, 2024 - ijcai.org

In many real-world scenarios, we need to select from a set of candidate policies before
online deployment. Although existing off-policy evaluation (OPE) methods can be used to …

相关文章所有 3 个版本网页快照

高级搜索

QQ 群