Adaptive trust region policy optimization: Global convergence and faster rates for regularized mdps L Shani, Y Efroni, S Mannor Thirty-Fourth AAAI Conference on Artificial Intelligence, 5668-5675, 2020 | 185 | 2020 |
Optimistic Policy Optimization with Bandit Feedback Y Efroni, L Shani, A Rosenberg, S Mannor Proceedings of the 37th International Conference on Machine Learning 119 …, 2020 | 95 | 2020 |
Mirror Descent Policy Optimization M Tomar, L Shani, Y Efroni, M Ghavamzadeh The Tenth International Conference on Learning Representations, 2020 | 64 | 2020 |
Factually consistent summarization via reinforcement learning with textual entailment feedback P Roit, J Ferret, L Shani, R Aharoni, G Cideron, R Dadashi, M Geist, ... arXiv preprint arXiv:2306.00186, 2023 | 42 | 2023 |
Online apprenticeship learning L Shani, T Zahavy, S Mannor Proceedings of the AAAI conference on artificial intelligence 36 (8), 8240-8248, 2022 | 26 | 2022 |
Exploration Conscious Reinforcement Learning Revisited L Shani, Y Efroni, S Mannor Proceedings of the 36th International Conference on Machine Learning, 5680--5689, 2019 | 19* | 2019 |
Demystifying embedding spaces using large language models G Tennenholtz, Y Chow, CW Hsu, J Jeong, L Shani, A Tulepbergenov, ... arXiv preprint arXiv:2310.04475, 2023 | 5 | 2023 |
Reinforcement learning with history dependent dynamic contexts G Tennenholtz, N Merlis, L Shani, M Mladenov, C Boutilier International Conference on Machine Learning, 34011-34053, 2023 | 3 | 2023 |
Reinforcement learning with a terminator G Tennenholtz, N Merlis, L Shani, S Mannor, U Shalit, G Chechik, ... Advances in Neural Information Processing Systems 35, 35696-35709, 2022 | 3 | 2022 |
Multi instance learning for unbalanced data M Kozdoba, E Moroshko, L Shani, T Takagi, T Katoh, S Mannor, ... arXiv preprint arXiv:1812.07010, 2018 | 1 | 2018 |
Offline Regularised Reinforcement Learning for Large Language Models Alignment PH Richemond, Y Tang, D Guo, D Calandriello, MG Azar, R Rafailov, ... arXiv preprint arXiv:2405.19107, 2024 | | 2024 |
Embedding-Aligned Language Models G Tennenholtz, Y Chow, CW Hsu, L Shani, E Liang, C Boutilier arXiv preprint arXiv:2406.00024, 2024 | | 2024 |
Multi-turn Reinforcement Learning from Preference Human Feedback L Shani, A Rosenberg, A Cassel, O Lang, D Calandriello, A Zipori, ... arXiv preprint arXiv:2405.14655, 2024 | | 2024 |