Optidice: Offline policy optimization via stationary distribution correction estimation- 学术资源搜索

文章

学术资源搜索

Optidice: Offline policy optimization via stationary distribution correction estimation

J Lee, W Jeon, B Lee, J Pineau… - … Conference on Machine …, 2021 - proceedings.mlr.press

We consider the offline reinforcement learning (RL) setting where the agent aims to optimize
the policy solely from the data without further environment interactions. In offline RL, the
distributional shift becomes the primary source of difficulty, which arises from the deviation of
the target policy being optimized from the behavior policy used for data collection. This
typically causes overestimation of action values, which poses severe problems for model-
free algorithms that use bootstrapping. To mitigate the problem, prior offline RL algorithms …

被引用次数：111 相关文章所有 9 个版本

[PDF] kaist.ac.kr

[PDF][PDF] OPTIDICE: OFFLINE POLICY OPTIMIZATION VIA STATIONARY DISTRIBUTION CORRECTION ESTIMA

J Lee, W Jeon, BJ Lee, J Pineau, KE Kim - ailab.kaist.ac.kr

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Optidice: Offline policy optimization via stationary distribution correction estimation

[PDF][PDF] OPTIDICE: OFFLINE POLICY OPTIMIZATION VIA STATIONARY DISTRIBUTION CORRECTION ESTIMA

引用