- 学术资源搜索

Causal machine learning: A survey and open problems

J Kaddour, A Lynch, Q Liu, MJ Kusner… - arXiv preprint arXiv …, 2022 - arxiv.org

Causal Machine Learning (CausalML) is an umbrella term for machine learning methods
that formalize the data-generation process as a structural causal model (SCM). This …

被引用次数：137 相关文章所有 2 个版本

[PDF] neurips.cc

Kernel instrumental variable regression

R Singh, M Sahani, A Gretton - Advances in Neural …, 2019 - proceedings.neurips.cc

Instrumental variable (IV) regression is a strategy for learning causal relationships in
observational data. If measurements of input X and output Y are confounded, the causal …

被引用次数：191 相关文章所有 11 个版本

[PDF] neurips.cc

Rl for latent mdps: Regret guarantees and a lower bound

J Kwon, Y Efroni, C Caramanis… - Advances in Neural …, 2021 - proceedings.neurips.cc

In this work, we consider the regret minimization problem for reinforcement learning in latent
Markov Decision Processes (LMDP). In an LMDP, an MDP is randomly drawn from a set of …

被引用次数：79 相关文章所有 7 个版本

[PDF] neurips.cc

Provably efficient reinforcement learning in partially observable dynamical systems

M Uehara, A Sekhari, JD Lee… - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract We study Reinforcement Learning for partially observable systems using function
approximation. We propose a new PO-bilinear framework, that is general enough to include …

被引用次数：35 相关文章所有 8 个版本

[PDF] acm.org

Optimistic mle: A generic model-based algorithm for partially observable sequential decision making

Q Liu, P Netrapalli, C Szepesvari, C Jin - Proceedings of the 55th …, 2023 - dl.acm.org

This paper introduces a simple efficient learning algorithms for general sequential decision
making. The algorithm combines Optimism for exploration with Maximum Likelihood …

被引用次数：36 相关文章所有 4 个版本

[PDF] arxiv.org

Pac reinforcement learning for predictive state representations

W Zhan, M Uehara, W Sun, JD Lee - arXiv preprint arXiv:2207.05738, 2022 - arxiv.org

In this paper we study online Reinforcement Learning (RL) in partially observable dynamical
systems. We focus on the Predictive State Representations (PSRs) model, which is an …

被引用次数：43 相关文章所有 3 个版本

[PDF] mlr.press

A minimax learning approach to off-policy evaluation in confounded partially observable markov decision processes

C Shi, M Uehara, J Huang… - … Conference on Machine …, 2022 - proceedings.mlr.press

We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes
(POMDPs), where the evaluation policy depends only on observable variables and the …

被引用次数：38 相关文章所有 7 个版本

[PDF] arxiv.org

Gec: A unified framework for interactive decision making in mdp, pomdp, and beyond

H Zhong, W Xiong, S Zheng, L Wang, Z Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

We study sample efficient reinforcement learning (RL) under the general framework of
interactive decision making, which includes Markov decision process (MDP), partially …

被引用次数：26 相关文章所有 3 个版本

[PDF] neurips.cc

Future-dependent value-based off-policy evaluation in pomdps

M Uehara, H Kiyohara, A Bennett… - Advances in …, 2024 - proceedings.neurips.cc

We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general
function approximation. Existing methods such as sequential importance sampling …

被引用次数：18 相关文章所有 8 个版本

[PDF] mlr.press

Causal imitation learning under temporally correlated noise

G Swamy, S Choudhury, D Bagnell… - … on Machine Learning, 2022 - proceedings.mlr.press

We develop algorithms for imitation learning from policy data that was corrupted by
temporally correlated noise in expert actions. When noise affects multiple timesteps of …

被引用次数：24 相关文章所有 5 个版本

高级搜索

QQ 群

Causal machine learning: A survey and open problems

Kernel instrumental variable regression

Rl for latent mdps: Regret guarantees and a lower bound

Provably efficient reinforcement learning in partially observable dynamical systems

Optimistic mle: A generic model-based algorithm for partially observable sequential decision making

Pac reinforcement learning for predictive state representations

A minimax learning approach to off-policy evaluation in confounded partially observable markov decision processes

Gec: A unified framework for interactive decision making in mdp, pomdp, and beyond

Future-dependent value-based off-policy evaluation in pomdps

Causal imitation learning under temporally correlated noise

引用