F Bianchi, FL Presti - … on Modeling, Analysis and Simulation of …, 2016 - ieeexplore.ieee.org
… algorithm, named MCRR (Markov Chain with Rewards Ranking) for the VNE … reward of a suitable Markov Chain. The intuition is that the accumulated reward, with each node reward …
F Bianchi, FL Presti - Systems Modeling: Methodologies and Tools, 2019 - Springer
… Following [6], for MCRR-LA we adopt a value of γ = 0.98 (the discount factor of the Markov Reward Model). In the implementation of the QoS-RS algorithm, we do not consider the node …
G Xiong, S Wang, J Li - Advances in Neural Information …, 2022 - proceedings.neurips.cc
… reward and multiple actions, where the state of each arm evolves according to a Markov decision process (MDP), and the reward … makings, rather than using a heuristic one or black-box …
T Keller, M Helmert - Proceedings of the International Conference on …, 2013 - ojs.aaai.org
… reward function, but as the problems of minimizing costs and maximizing rewards are equivalent we use reward-… Almost all algorithms we are aware of perform Monte-Carlo sampling, ie …
H Chen, C Wang, J Huang, J Gong - Swarm and Evolutionary Computation, 2021 - Elsevier
… In other words, a reward received by the … heuristic accelerated Markov games with XCS (HAMXCS) to solve competitive Markov games, which incorporates provided rough heuristic …
… to accelerate the discovery of rewards. By maximizing the … Under the sparse-reward setting, we show that the proposed … to learn winning strategies under the sparse-reward setting. …
S Miura - Proceedings of the International Symposium on …, 2022 - ojs.aaai.org
… the first heuristic search algorithm for … -aware behaviors called Observer-Aware MDP (OAMDP) (Miura and Zilberstein 2021). OAMDP is a variant of MDP with non-Markovianrewards, …
… on the formal concept that is known as a Markov Decision Process (MDP). Literature on … of an MDP’s reward function by specification of a set of factored reward functions. In a finite-…
… not aware of any regret lower bound for the online Frank-Wolfe algorithm that involves β. … with existing heuristics on reinforcement learning algorithms assuming scalar rewards for …