Variational inverse control with events: A general framework for data-driven reward definition

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：206 相关文章所有 3 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：4533 相关文章所有 2 个版本

[PDF] neurips.cc

Contrastive learning as goal-conditioned reinforcement learning

B Eysenbach, T Zhang, S Levine… - Advances in Neural …, 2022 - proceedings.neurips.cc

In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often …

被引用次数：135 相关文章所有 6 个版本

[PDF] neurips.cc

Roboclip: One demonstration is enough to learn robot policies

S Sontakke, J Zhang, S Arnold… - Advances in …, 2024 - proceedings.neurips.cc

Reward specification is a notoriously difficult problem in reinforcement learning, requiring
extensive expert supervision to design robust reward functions. Imitation learning (IL) …

被引用次数：52 相关文章所有 7 个版本

[PDF] arxiv.org

Maximum entropy RL (provably) solves some robust RL problems

B Eysenbach, S Levine - arXiv preprint arXiv:2103.06257, 2021 - arxiv.org

Many potential applications of reinforcement learning (RL) require guarantees that the agent
will perform well in the face of disturbances to the dynamics or reward function. In this paper …

被引用次数：200 相关文章所有 4 个版本

[PDF] mlr.press

Learning language-conditioned robot behavior from offline data and crowd-sourced annotation

S Nair, E Mitchell, K Chen… - Conference on Robot …, 2022 - proceedings.mlr.press

We study the problem of learning a range of vision-based manipulation tasks from a large
offline dataset of robot interaction. In order to accomplish this, humans need easy and …

被引用次数：158 相关文章所有 5 个版本

[PDF] arxiv.org

End-to-end robotic reinforcement learning without reward engineering

A Singh, L Yang, K Hartikainen, C Finn… - arXiv preprint arXiv …, 2019 - arxiv.org

The combination of deep neural network models and reinforcement learning algorithms can
make it possible to learn policies for robotic behaviors that directly read in raw sensory …

被引用次数：325 相关文章所有 7 个版本

[PDF] mlr.press

Solar: Deep structured representations for model-based reinforcement learning

M Zhang, S Vikram, L Smith, P Abbeel… - International …, 2019 - proceedings.mlr.press

Abstract Model-based reinforcement learning (RL) has proven to be a data efficient
approach for learning control tasks but is difficult to utilize in domains with complex …

被引用次数：312 相关文章所有 7 个版本

[HTML] informs.org

Global optimality guarantees for policy gradient methods

J Bhandari, D Russo - Operations Research, 2024 - pubsonline.informs.org

Policy gradients methods apply to complex, poorly understood, control problems by
performing stochastic gradient descent over a parameterized class of polices. Unfortunately …

被引用次数：285 相关文章所有 7 个版本

[PDF] mlr.press

Can foundation models perform zero-shot task specification for robot manipulation?

Y Cui, S Niekum, A Gupta, V Kumar… - … for dynamics and …, 2022 - proceedings.mlr.press

Task specification is at the core of programming autonomous robots. A low-effort modality for
task specification is critical for engagement of non-expert end users and ultimate adoption of …

被引用次数：87 相关文章所有 5 个版本

高级搜索

QQ 群