Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Contrastive learning as goal-conditioned reinforcement learning

B Eysenbach, T Zhang, S Levine… - Advances in Neural …, 2022 - proceedings.neurips.cc
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often …

Roboclip: One demonstration is enough to learn robot policies

S Sontakke, J Zhang, S Arnold… - Advances in …, 2024 - proceedings.neurips.cc
Reward specification is a notoriously difficult problem in reinforcement learning, requiring
extensive expert supervision to design robust reward functions. Imitation learning (IL) …

Maximum entropy RL (provably) solves some robust RL problems

B Eysenbach, S Levine - arXiv preprint arXiv:2103.06257, 2021 - arxiv.org
Many potential applications of reinforcement learning (RL) require guarantees that the agent
will perform well in the face of disturbances to the dynamics or reward function. In this paper …

Learning language-conditioned robot behavior from offline data and crowd-sourced annotation

S Nair, E Mitchell, K Chen… - Conference on Robot …, 2022 - proceedings.mlr.press
We study the problem of learning a range of vision-based manipulation tasks from a large
offline dataset of robot interaction. In order to accomplish this, humans need easy and …

End-to-end robotic reinforcement learning without reward engineering

A Singh, L Yang, K Hartikainen, C Finn… - arXiv preprint arXiv …, 2019 - arxiv.org
The combination of deep neural network models and reinforcement learning algorithms can
make it possible to learn policies for robotic behaviors that directly read in raw sensory …

Solar: Deep structured representations for model-based reinforcement learning

M Zhang, S Vikram, L Smith, P Abbeel… - International …, 2019 - proceedings.mlr.press
Abstract Model-based reinforcement learning (RL) has proven to be a data efficient
approach for learning control tasks but is difficult to utilize in domains with complex …

Global optimality guarantees for policy gradient methods

J Bhandari, D Russo - Operations Research, 2024 - pubsonline.informs.org
Policy gradients methods apply to complex, poorly understood, control problems by
performing stochastic gradient descent over a parameterized class of polices. Unfortunately …

Can foundation models perform zero-shot task specification for robot manipulation?

Y Cui, S Niekum, A Gupta, V Kumar… - … for dynamics and …, 2022 - proceedings.mlr.press
Task specification is at the core of programming autonomous robots. A low-effort modality for
task specification is critical for engagement of non-expert end users and ultimate adoption of …