Bandit algorithms: A comprehensive review and their dynamic selection from a portfolio for multicriteria top-k recommendation

A Letard, N Gutowski, O Camp, T Amghar - Expert Systems with …, 2024 - Elsevier
This paper discusses the use of portfolio approaches based on bandit algorithms to optimize
multicriteria decision-making in recommender systems (accuracy and diversity). While …

Evolutionary Reinforcement Learning: A Systematic Review and Future Directions

Y Lin, F Lin, G Cai, H Chen, L Zou, P Wu - arXiv preprint arXiv:2402.13296, 2024 - arxiv.org
In response to the limitations of reinforcement learning and evolutionary algorithms (EAs) in
complex problem-solving, Evolutionary Reinforcement Learning (EvoRL) has emerged as a …

Improving proximal policy optimization with alpha divergence

H Xu, Z Yan, J Xuan, G Zhang, J Lu - Neurocomputing, 2023 - Elsevier
Proximal policy optimization (PPO) is a recent advancement in reinforcement learning,
which is formulated as an unconstrained optimization problem including two terms …

[HTML][HTML] An on-site-based opportunistic routing protocol for scalable and energy-efficient underwater acoustic sensor networks

R Zhu, X Huang, X Huang, D Li, Q Yang - Applied Sciences, 2022 - mdpi.com
With the advancements in wireless sensor networks and the Internet of Underwater Things
(IoUT), underwater acoustic sensor networks (UASNs) have attracted much attention, which …

Applications of Reinforcement Learning in Finance--Trading with a Double Deep Q-Network

F Zejnullahu, M Moser, J Osterrieder - arXiv preprint arXiv:2206.14267, 2022 - arxiv.org
This paper presents a Double Deep Q-Network algorithm for trading single assets, namely
the E-mini S&P 500 continuous futures contract. We use a proven setup as the foundation for …

Demonstration Guided Multi-Objective Reinforcement Learning

J Lu, P Mannion, K Mason - arXiv preprint arXiv:2404.03997, 2024 - arxiv.org
Multi-objective reinforcement learning (MORL) is increasingly relevant due to its
resemblance to real-world scenarios requiring trade-offs between multiple objectives …

[HTML][HTML] Cognitive mapping and episodic memory emerge from simple associative learning rules

ED Gribkova, G Chowdhary, R Gillette - Neurocomputing, 2024 - Elsevier
Episodic memory enables animals to map contexts and environmental features in space and
time but is underused in artificial intelligence (AI). Here we show how simple associative …

From Learning Agents to Agile Software: Reinforcement Learning's Transformative Role in Requirements Engineering

FF Parsa, AAA Moghadam, T Ashuri - SoutheastCon 2024, 2024 - ieeexplore.ieee.org
This paper studies the trans formative role of Reinforcement Learning for Requirements
Engineering in the context of software development. The integration of Reinforcement …

Lunar Lander Using Reinforcement Learning Algorithm

R Mali, N Kande, S Mandwade… - 2023 7th International …, 2023 - ieeexplore.ieee.org
The harsh and dynamic lunar surface environment, developing autonomous lunar landing
has proven to be a difficult problem for space agencies. Reinforcement learning (RL) has …

Cloud Elasticity of Microservices-based Applications: A Survey

MH Fourati, S Marzouk, M Jmaiel - 2024 - researchsquare.com
Elasticity is an essential treatment in Cloudenvironment employed in academic and
industrial contexts. The main purpose of elasticity is to reduce thedeployment cost while …