Towards characterizing divergence in deep q-learning

A Gunjan, S Bhattacharyya - Artificial Intelligence Review, 2023 - Springer

Portfolio optimization has always been a challenging proposition in finance and
management. Portfolio optimization facilitates in selection of portfolios in a volatile market …

被引用次数：112 相关文章所有 6 个版本

[PDF] neurips.cc

Conservative q-learning for offline reinforcement learning

A Kumar, A Zhou, G Tucker… - Advances in Neural …, 2020 - proceedings.neurips.cc

Effectively leveraging large, previously collected datasets in reinforcement learn-ing (RL) is
a key challenge for large-scale real-world applications. Offline RL algorithms promise to …

被引用次数：1996 相关文章所有 10 个版本

[PDF] neurips.cc

Contrastive learning as goal-conditioned reinforcement learning

B Eysenbach, T Zhang, S Levine… - Advances in Neural …, 2022 - proceedings.neurips.cc

In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often …

被引用次数：134 相关文章所有 6 个版本

[PDF] arxiv.org

A survey and critique of multiagent deep reinforcement learning

P Hernandez-Leal, B Kartal, ME Taylor - Autonomous Agents and Multi …, 2019 - Springer

Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has
led to a dramatic increase in the number of applications and methods. Recent works have …

被引用次数：685 相关文章所有 8 个版本

[PDF] arxiv.org

Tensor programs ii: Neural tangent kernel for any architecture

G Yang - arXiv preprint arXiv:2006.14548, 2020 - arxiv.org

We prove that a randomly initialized neural network of* any architecture* has its Tangent
Kernel (NTK) converge to a deterministic limit, as the network widths tend to infinity. We …

被引用次数：148 相关文章所有 2 个版本

[PDF] neurips.cc

Softmax deep double deterministic policy gradients

L Pan, Q Cai, L Huang - Advances in neural information …, 2020 - proceedings.neurips.cc

A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep
Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can …

被引用次数：106 相关文章所有 7 个版本

[PDF] arxiv.org

Iris: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data

A Mandlekar, F Ramos, B Boots… - … on Robotics and …, 2020 - ieeexplore.ieee.org

Learning from offline task demonstrations is a problem of great interest in robotics. For
simple short-horizon manipulation tasks with modest variation in task instances, offline …

被引用次数：128 相关文章所有 3 个版本

[PDF] neurips.cc

Discor: Corrective feedback in reinforcement learning via distribution correction

A Kumar, A Gupta, S Levine - Advances in Neural …, 2020 - proceedings.neurips.cc

Deep reinforcement learning can learn effective policies for a wide range of tasks, but is
notoriously difficult to use due to instability and sensitivity to hyperparameters. The reasons …

被引用次数：123 相关文章所有 7 个版本

[PDF] wisc.edu

Introduction to reinforcement learning

Z Ding, Y Huang, H Yuan, H Dong - Deep reinforcement learning …, 2020 - Springer

In this chapter, we introduce the fundamentals of classical reinforcement learning and
provide a general overview of deep reinforcement learning. We first start with the basic …

被引用次数：149 相关文章所有 5 个版本

[PDF] neurips.cc

Finite-time analysis of whittle index based Q-learning for restless multi-armed bandits with neural network function approximation

G Xiong, J Li - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc

Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB)
problem. Although it is provably asymptotically optimal, finding Whittle indices remains …

被引用次数：13 相关文章所有 8 个版本

高级搜索

QQ 群