Q-learning algorithms: A comprehensive classification and applications

B Jang, M Kim, G Harerimana, JW Kim - IEEE access, 2019 - ieeexplore.ieee.org
Q-learning is arguably one of the most applied representative reinforcement learning
approaches and one of the off-policy strategies. Since the emergence of Q-learning, many …

[HTML][HTML] Reinforcement learning for swarm robotics: An overview of applications, algorithms and simulators

MA Blais, MA Akhloufi - Cognitive Robotics, 2023 - Elsevier
Robots such as drones, ground rovers, underwater vehicles and industrial robots have
increased in popularity in recent years. Many sectors have benefited from this by increasing …

A survey and critique of multiagent deep reinforcement learning

P Hernandez-Leal, B Kartal, ME Taylor - Autonomous Agents and Multi …, 2019 - Springer
Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has
led to a dramatic increase in the number of applications and methods. Recent works have …

Adversarial evaluation of autonomous vehicles in lane-change scenarios

B Chen, X Chen, Q Wu, L Li - IEEE transactions on intelligent …, 2021 - ieeexplore.ieee.org
Autonomous vehicles must be comprehensively evaluated before deployed in cities and
highways. However, most existing evaluation approaches for autonomous vehicles are static …

Subgaussian and differentiable importance sampling for off-policy evaluation and learning

AM Metelli, A Russo, M Restelli - Advances in neural …, 2021 - proceedings.neurips.cc
Importance Sampling (IS) is a widely used building block for a large variety of off-policy
estimation and learning algorithms. However, empirical and theoretical studies have …

[PDF][PDF] Is multiagent deep reinforcement learning the answer or the question? A brief survey

P Hernandez-Leal, B Kartal, ME Taylor - learning, 2018 - researchgate.net
Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has
led to a dramatic increase in the number of applications and methods. Recent works have …

Experience selection in deep reinforcement learning for control

T De Bruin, J Kober, K Tuyls, R Babuška - Journal of Machine Learning …, 2018 - jmlr.org
Experience replay is a technique that allows off-policy reinforcement-learning methods to
reuse past experiences. The stability and speed of convergence of reinforcement learning …

AI research considerations for human existential safety (ARCHES)

A Critch, D Krueger - arXiv preprint arXiv:2006.04948, 2020 - arxiv.org
Framed in positive terms, this report examines how technical AI research might be steered in
a manner that is more attentive to humanity's long-term prospects for survival as a species …

Importance sampling in reinforcement learning with an estimated behavior policy

JP Hanna, S Niekum, P Stone - Machine Learning, 2021 - Springer
In reinforcement learning, importance sampling is a widely used method for evaluating an
expectation under the distribution of data of one policy when the data has in fact been …

Expected policy gradients

K Ciosek, S Whiteson - Proceedings of the AAAI Conference on …, 2018 - ojs.aaai.org
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG)
and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected …