aware heuristic markov reward- 学术资源搜索

A markov reward based resource-latency aware heuristic for the virtual network embedding problem

F Bianchi, F Lo Presti - ACM SIGMETRICS Performance Evaluation …, 2017 - dl.acm.org

… • Differently from prior approaches, our metrics are based on the accumulated reward of
suitable Markov Chains. The intuition is that the accumulated reward, with each node reward …

被引用次数：15 相关文章所有 2 个版本

A markov reward model based greedy heuristic for the virtual network embedding problem

F Bianchi, FL Presti - … on Modeling, Analysis and Simulation of …, 2016 - ieeexplore.ieee.org

… algorithm, named MCRR (Markov Chain with Rewards Ranking) for the VNE … reward of
a suitable Markov Chain. The intuition is that the accumulated reward, with each node reward …

被引用次数：20 相关文章

A Comparison of Markov Reward Based Resource-Latency Aware Heuristics for the Virtual Network Embedding Problem

F Bianchi, FL Presti - Systems Modeling: Methodologies and Tools, 2019 - Springer

… Following [6], for MCRR-LA we adopt a value of γ = 0.98 (the discount factor of the Markov
Reward Model). In the implementation of the QoS-RS algorithm, we do not consider the node …

Learning infinite-horizon average-reward restless multi-action bandits via index awareness

G Xiong, S Wang, J Li - Advances in Neural Information …, 2022 - proceedings.neurips.cc

… reward and multiple actions, where the state of each arm evolves according to a Markov
decision process (MDP), and the reward … makings, rather than using a heuristic one or black-box …

被引用次数：10 相关文章所有 5 个版本

[PDF] aaai.org

Trial-based heuristic tree search for finite horizon MDPs

T Keller, M Helmert - Proceedings of the International Conference on …, 2013 - ojs.aaai.org

… reward function, but as the problems of minimizing costs and maximizing rewards are equivalent
we use reward-… Almost all algorithms we are aware of perform Monte-Carlo sampling, ie …

被引用次数：152 相关文章所有 19 个版本

[PDF] arxiv.org

Efficient use of heuristics for accelerating XCS-based policy learning in Markov games

H Chen, C Wang, J Huang, J Gong - Swarm and Evolutionary Computation, 2021 - Elsevier

… In other words, a reward received by the … heuristic accelerated Markov games with XCS
(HAMXCS) to solve competitive Markov games, which incorporates provided rough heuristic …

被引用次数：5 相关文章所有 4 个版本

[PDF] aaai.org

Subspace-aware exploration for sparse-reward multi-agent tasks

P Xu, J Zhang, Q Yin, C Yu, Y Yang… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

… to accelerate the discovery of rewards. By maximizing the … Under the sparse-reward setting,
we show that the proposed … to learn winning strategies under the sparse-reward setting. …

被引用次数：2 相关文章所有 2 个版本

[PDF] aaai.org

Beyond Markov Decision Process with Scalar Markovian Rewards

S Miura - Proceedings of the International Symposium on …, 2022 - ojs.aaai.org

… the first heuristic search algorithm for … -aware behaviors called Observer-Aware MDP (OAMDP)
(Miura and Zilberstein 2021). OAMDP is a variant of MDP with non-Markovian rewards, …

[PDF][PDF] Anytime optimal MDP planning with trial-based heuristic tree search

T Keller - 2015 - ai.dmi.unibas.ch

… on the formal concept that is known as a Markov Decision Process (MDP). Literature on …
of an MDP’s reward function by specification of a set of factored reward functions. In a finite-…

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

Exploration-exploitation trade-off in reinforcement learning on online markov decision processes with global concave rewards

WC Cheung - arXiv preprint arXiv:1905.06466, 2019 - arxiv.org

… not aware of any regret lower bound for the online Frank-Wolfe algorithm that involves β. …
with existing heuristics on reinforcement learning algorithms assuming scalar rewards for …

被引用次数：19 相关文章所有 3 个版本

高级搜索

QQ 群

A markov reward based resource-latency aware heuristic for the virtual network embedding problem

A markov reward model based greedy heuristic for the virtual network embedding problem

A Comparison of Markov Reward Based Resource-Latency Aware Heuristics for the Virtual Network Embedding Problem

Learning infinite-horizon average-reward restless multi-action bandits via index awareness

Trial-based heuristic tree search for finite horizon MDPs

Efficient use of heuristics for accelerating XCS-based policy learning in Markov games

Subspace-aware exploration for sparse-reward multi-agent tasks

Beyond Markov Decision Process with Scalar Markovian Rewards

[PDF][PDF] Anytime optimal MDP planning with trial-based heuristic tree search

Exploration-exploitation trade-off in reinforcement learning on online markov decision processes with global concave rewards

相关搜索

引用