Regret minimization for partially observable deep reinforcement learning

Y Yang, J Wang - arXiv preprint arXiv:2011.00583, 2020 - arxiv.org

Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

被引用次数：350 相关文章所有 2 个版本

[PDF] google.com

Causal inference and counterfactual prediction in machine learning for actionable healthcare

M Prosperi, Y Guo, M Sperrin, JS Koopman… - Nature Machine …, 2020 - nature.com

Big data, high-performance computing, and (deep) machine learning are increasingly
becoming key to precision medicine—from identifying disease risks and taking preventive …

被引用次数：361 相关文章所有 6 个版本

[PDF] mlr.press

Deep counterfactual regret minimization

N Brown, A Lerer, S Gross… - … conference on machine …, 2019 - proceedings.mlr.press

Abstract Counterfactual Regret Minimization (CFR) is the leading algorithm for solving large
imperfect-information games. It converges to an equilibrium by iteratively traversing the …

被引用次数：289 相关文章所有 7 个版本

[PDF] arxiv.org

Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge

A Singla, S Padakandla… - IEEE transactions on …, 2019 - ieeexplore.ieee.org

This paper presents our method for enabling a UAV quadrotor, equipped with a monocular
camera, to autonomously avoid collisions with obstacles in unstructured and unknown …

被引用次数：255 相关文章所有 8 个版本

[PDF] neurips.cc

Actor-critic policy optimization in partially observable multiagent environments

S Srinivasan, M Lanctot, V Zambaldi… - Advances in neural …, 2018 - proceedings.neurips.cc

Optimization of parameterized policies for reinforcement learning (RL) is an important and
challenging problem in artificial intelligence. Among the most common approaches are …

被引用次数：174 相关文章所有 9 个版本

[PDF] mdpi.com

Recent advances in deep reinforcement learning applications for solving partially observable markov decision processes (pomdp) problems: Part 1—fundamentals …

X Xiang, S Foo - Machine Learning and Knowledge Extraction, 2021 - mdpi.com

The first part of a two-part series of papers provides a survey on recent advances in Deep
Reinforcement Learning (DRL) applications for solving partially observable Markov decision …

被引用次数：59 相关文章所有 6 个版本

[PDF] neurips.cc

Computing optimal equilibria and mechanisms via learning in zero-sum extensive-form games

B Zhang, G Farina, I Anagnostides… - Advances in …, 2024 - proceedings.neurips.cc

We introduce a new approach for computing optimal equilibria via learning in games. It
applies to extensive-form settings with any number of players, including mechanism design …

被引用次数：13 相关文章所有 7 个版本

[PDF] arxiv.org

Dream: Deep regret minimization with advantage baselines and model-free learning

E Steinberger, A Lerer, N Brown - arXiv preprint arXiv:2006.10410, 2020 - arxiv.org

We introduce DREAM, a deep reinforcement learning algorithm that finds optimal strategies
in imperfect-information games with multiple agents. Formally, DREAM converges to a Nash …

被引用次数：59 相关文章所有 4 个版本

[PDF] arxiv.org

Remix: Regret minimization for monotonic value function factorization in multiagent reinforcement learning

Y Mei, H Zhou, T Lan - arXiv preprint arXiv:2302.05593, 2023 - arxiv.org

Value function factorization methods have become a dominant approach for cooperative
multiagent reinforcement learning under a centralized training and decentralized execution …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Double neural counterfactual regret minimization

H Li, K Hu, Z Ge, T Jiang, Y Qi, L Song - arXiv preprint arXiv:1812.10607, 2018 - arxiv.org

Counterfactual Regret Minimization (CRF) is a fundamental and effective technique for
solving Imperfect Information Games (IIG). However, the original CRF algorithm only works …

被引用次数：69 相关文章所有 7 个版本

高级搜索

QQ 群