A markov reward based resource-latency aware heuristic for the virtual network embedding problem

F Bianchi, F Lo Presti - ACM SIGMETRICS Performance Evaluation …, 2017 - dl.acm.org
… • Differently from prior approaches, our metrics are based on the accumulated reward of
suitable Markov Chains. The intuition is that the accumulated reward, with each node reward

A markov reward model based greedy heuristic for the virtual network embedding problem

F Bianchi, FL Presti - … on Modeling, Analysis and Simulation of …, 2016 - ieeexplore.ieee.org
… algorithm, named MCRR (Markov Chain with Rewards Ranking) for the VNE … reward of
a suitable Markov Chain. The intuition is that the accumulated reward, with each node reward

A Comparison of Markov Reward Based Resource-Latency Aware Heuristics for the Virtual Network Embedding Problem

F Bianchi, FL Presti - Systems Modeling: Methodologies and Tools, 2019 - Springer
… Following [6], for MCRR-LA we adopt a value of γ = 0.98 (the discount factor of the Markov
Reward Model). In the implementation of the QoS-RS algorithm, we do not consider the node …

Learning infinite-horizon average-reward restless multi-action bandits via index awareness

G Xiong, S Wang, J Li - Advances in Neural Information …, 2022 - proceedings.neurips.cc
reward and multiple actions, where the state of each arm evolves according to a Markov
decision process (MDP), and the reward … makings, rather than using a heuristic one or black-box …

Trial-based heuristic tree search for finite horizon MDPs

T Keller, M Helmert - Proceedings of the International Conference on …, 2013 - ojs.aaai.org
reward function, but as the problems of minimizing costs and maximizing rewards are equivalent
we use reward-… Almost all algorithms we are aware of perform Monte-Carlo sampling, ie …

Efficient use of heuristics for accelerating XCS-based policy learning in Markov games

H Chen, C Wang, J Huang, J Gong - Swarm and Evolutionary Computation, 2021 - Elsevier
… In other words, a reward received by the … heuristic accelerated Markov games with XCS
(HAMXCS) to solve competitive Markov games, which incorporates provided rough heuristic

Subspace-aware exploration for sparse-reward multi-agent tasks

P Xu, J Zhang, Q Yin, C Yu, Y Yang… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
… to accelerate the discovery of rewards. By maximizing the … Under the sparse-reward setting,
we show that the proposed … to learn winning strategies under the sparse-reward setting. …

Beyond Markov Decision Process with Scalar Markovian Rewards

S Miura - Proceedings of the International Symposium on …, 2022 - ojs.aaai.org
… the first heuristic search algorithm for … -aware behaviors called Observer-Aware MDP (OAMDP)
(Miura and Zilberstein 2021). OAMDP is a variant of MDP with non-Markovian rewards, …

[PDF][PDF] Anytime optimal MDP planning with trial-based heuristic tree search

T Keller - 2015 - ai.dmi.unibas.ch
… on the formal concept that is known as a Markov Decision Process (MDP). Literature on …
of an MDP’s reward function by specification of a set of factored reward functions. In a finite-…

Exploration-exploitation trade-off in reinforcement learning on online markov decision processes with global concave rewards

WC Cheung - arXiv preprint arXiv:1905.06466, 2019 - arxiv.org
… not aware of any regret lower bound for the online Frank-Wolfe algorithm that involves β. …
with existing heuristics on reinforcement learning algorithms assuming scalar rewards for …