Scaling data-driven robotics with reward sketching and batch reinforcement learning

S Cabi, SG Colmenarejo, A Novikov… - arXiv preprint arXiv …, 2019 - arxiv.org
We present a framework for data-driven robotics that makes use of a large dataset of
recorded robot experience and scales to several tasks using learned reward functions. We …

Asking easy questions: A user-friendly approach to active reward learning

E Bıyık, M Palan, NC Landolfi, DP Losey… - arXiv preprint arXiv …, 2019 - arxiv.org
Robots can learn the right reward function by querying a human expert. Existing approaches
attempt to choose questions where the robot is most uncertain about the human's response; …

Machine teaching for inverse reinforcement learning: Algorithms and applications

DS Brown, S Niekum - Proceedings of the AAAI Conference on Artificial …, 2019 - ojs.aaai.org
Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing
for policy improvement and generalization. However, despite much recent interest in IRL …

Interactive teaching algorithms for inverse reinforcement learning

P Kamalaruban, R Devidze, V Cevher… - arXiv preprint arXiv …, 2019 - arxiv.org
We study the problem of inverse reinforcement learning (IRL) with the added twist that the
learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic …

Incomplete contracting and AI alignment

D Hadfield-Menell, GK Hadfield - Proceedings of the 2019 AAAI/ACM …, 2019 - dl.acm.org
We suggest that the analysis of incomplete contracting developed by law and economics
researchers can provide a useful framework for understanding the AI alignment problem and …

Preferences implicit in the state of the world

R Shah, D Krasheninnikov, J Alexander… - arXiv preprint arXiv …, 2019 - arxiv.org
Reinforcement learning (RL) agents optimize only the features specified in a reward function
and are indifferent to anything left out inadvertently. This means that we must not only …

Batch active learning using determinantal point processes

E Bıyık, K Wang, N Anari, D Sadigh - arXiv preprint arXiv:1906.07975, 2019 - arxiv.org
Data collection and labeling is one of the main challenges in employing machine learning
algorithms in a variety of real-world applications with limited data. While active learning …

Active learning of reward dynamics from hierarchical queries

C Basu, E Bıyık, Z He, M Singhal… - 2019 IEEE/RSJ …, 2019 - ieeexplore.ieee.org
Enabling robots to act according to human preferences across diverse environments is a
crucial task, extensively studied by both roboticists and machine learning researchers. To …

The green choice: Learning and influencing human decisions on shared roads

E Bıyık, DA Lazar, D Sadigh… - 2019 IEEE 58th …, 2019 - ieeexplore.ieee.org
Autonomous vehicles have the potential to increase the capacity of roads via platooning,
even when human drivers and autonomous vehicles share roads. However, when users of a …

Verifying robustness of human-aware autonomous cars

D Sadigh, SS Sastry, SA Seshia - IFAC-PapersOnLine, 2019 - Elsevier
As human-robot systems make their ways into our every day life, safety has become a core
concern of the learning algorithms used by such systems. Examples include semi …