Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... arXiv preprint arXiv:2307.15217, 2023 | 236 | 2023 |
Red-Teaming the Stable Diffusion Safety Filter J Rando, D Paleka, D Lindner, L Heim, F Tramèr NeurIPS ML Safety Workshop, 2022 | 85 | 2022 |
Tracr: Compiled Transformers as a Laboratory for Interpretability D Lindner, J Kramár, M Rahtz, T McGrath, V Mikulik Conference on Neural Information Processing Systems (NeurIPS), 2023 | 37 | 2023 |
GoSafeOpt: Scalable Safe Exploration for Global Optimization of Dynamical Systems B Sukhija, M Turchetta, D Lindner, A Krause, S Trimpe, D Baumann Artificial Intelligence, 103922, 2023 | 20 | 2023 |
Sensing Social Media Signals for Cryptocurrency News J Beck, R Huang, D Lindner, T Guo, C Zhang, D Helbing, ... Companion Proceedings of The 2019 World Wide Web Conference, 2019 | 19 | 2019 |
Active exploration for inverse reinforcement learning D Lindner, A Krause, G Ramponi Advances in Neural Information Processing Systems 35, 5843-5853, 2022 | 18 | 2022 |
Information Directed Reward Learning for Reinforcement Learning D Lindner, M Turchetta, S Tschiatschek, K Ciosek, A Krause Conference on Neural Information Processing Systems (NeurIPS), 2021 | 17 | 2021 |
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning J Rocamonde, V Montesinos, E Nava, E Perez, D Lindner arXiv preprint arXiv:2310.12921, 2023 | 15 | 2023 |
Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning D Lindner, M El-Assady Communication in Human-AI Interaction Workshop (CHAI) at IJCAI-ECAI, 2022 | 11 | 2022 |
Evaluating Frontier Models for Dangerous Capabilities M Phuong, M Aitchison, E Catt, S Cogan, A Kaskasoli, V Krakovna, ... arXiv preprint arXiv:2403.13793, 2024 | 7 | 2024 |
Addressing the Long-term Impact of ML Decisions via Policy Regret D Lindner, H Heidari, A Krause International Joint Conferences on Artificial Intelligence (IJCAI), 2021 | 7 | 2021 |
Challenges for Using Impact Regularizers to Avoid Negative Side Effects D Lindner, K Matoba, A Meulemans SafeAI Workshop at AAAI 2021, 2021 | 7 | 2021 |
Interactively Learning Preference Constraints in Linear Bandits D Lindner, S Tschiatschek, K Hofmann, A Krause International Conference on Machine Learning (ICML), 2022 | 6 | 2022 |
Learning safety constraints from demonstrations with unknown rewards D Lindner, X Chen, S Tschiatschek, K Hofmann, A Krause International Conference on Artificial Intelligence and Statistics, 2386-2394, 2024 | 5 | 2024 |
Topological semimetals and insulators in three-dimensional honeycomb materials D Wawrzik, D Lindner, M Hermanns, S Trebst Physical Review B 98 (11), 115114, 2018 | 5 | 2018 |
RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback Y Metz, D Lindner, R Baur, D Keim, M El-Assady Interactive Learning with Implicit Human Feedback Workshop at ICML, 2023 | 4 | 2023 |
Learning What To Do by Simulating the Past D Lindner, R Shah, P Abbeel, A Dragan International Conference on Learning Representations (ICLR), 2021 | 4 | 2021 |
Detecting Spiky Corruption in Markov Decision Processes J Mancuso, T Kisielewski, D Lindner, A Singh Workshop on Artificial Intelligence Safety at IJCAI 2019, 2019 | 2 | 2019 |
Algorithmic Foundations for Safe and Efficient Reinforcement Learning from Human Feedback D Lindner ETH Zurich, 2023 | | 2023 |