Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 843 | 2023 |
Reinforcement learning for integer programming: Learning to cut Y Tang, S Agrawal, Y Faenza International conference on machine learning, 9367-9376, 2020 | 203 | 2020 |
Es-maml: Simple hessian-free meta learning X Song, W Gao, Y Yang, K Choromanski, A Pacchiano, Y Tang arXiv preprint arXiv:1910.01215, 2019 | 131 | 2019 |
Discretizing continuous action space for on-policy optimization Y Tang, S Agrawal Proceedings of the aaai conference on artificial intelligence 34 (04), 5981-5988, 2020 | 119 | 2020 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ... arXiv preprint arXiv:2403.05530, 2024 | 81 | 2024 |
Monte-Carlo tree search as regularized policy optimization JB Grill, F Altché, Y Tang, T Hubert, M Valko, I Antonoglou, R Munos International Conference on Machine Learning, 3769-3778, 2020 | 70 | 2020 |
Byol-explore: Exploration by bootstrapped prediction Z Guo, S Thakoor, M Pîslar, B Avila Pires, F Altché, C Tallec, A Saade, ... Advances in neural information processing systems 35, 31855-31870, 2022 | 57 | 2022 |
From complexity to simplicity: Adaptive es-active subspaces for blackbox optimization KM Choromanski, A Pacchiano, J Parker-Holder, Y Tang, V Sindhwani Advances in Neural Information Processing Systems 32, 2019 | 50 | 2019 |
Orthogonal estimation of Wasserstein distances M Rowland, J Hron, Y Tang, K Choromanski, T Sarlos, A Weller The 22nd International Conference on Artificial Intelligence and Statistics …, 2019 | 47 | 2019 |
Provably robust blackbox optimization for reinforcement learning K Choromanski, A Pacchiano, J Parker-Holder, Y Tang, D Jain, Y Yang, ... CoRR, abs/1903.02993, 2019 | 42 | 2019 |
Learning to Score Behaviors for Guided Policy Optimization A Pacchiano, J Parker-Holder, Y Tang, A Choromanska, K Choromanski, ... arXiv preprint arXiv:1906.04349, 2019 | 40 | 2019 |
Exploration by distributional reinforcement learning Y Tang, S Agrawal arXiv preprint arXiv:1805.01907, 2018 | 40 | 2018 |
Boosting trust region policy optimization by normalizing flows policy Y Tang, S Agrawal arXiv preprint arXiv:1809.10326, 2018 | 33 | 2018 |
Nash learning from human feedback R Munos, M Valko, D Calandriello, MG Azar, M Rowland, ZD Guo, Y Tang, ... arXiv preprint arXiv:2312.00886, 2023 | 31 | 2023 |
Understanding self-predictive learning for reinforcement learning Y Tang, ZD Guo, PH Richemond, BA Pires, Y Chandak, R Munos, ... International Conference on Machine Learning, 33632-33656, 2023 | 26 | 2023 |
Self-imitation learning via generalized lower bound q-learning Y Tang Advances in neural information processing systems 33, 13964-13975, 2020 | 23 | 2020 |
Hindsight expectation maximization for goal-conditioned reinforcement learning Y Tang, A Kucukelbir International Conference on Artificial Intelligence and Statistics, 2863-2871, 2021 | 20 | 2021 |
Revisiting Peng’s Q() for Modern Reinforcement Learning T Kozuno, Y Tang, M Rowland, R Munos, S Kapturowski, W Dabney, ... International Conference on Machine Learning, 5794-5804, 2021 | 19 | 2021 |
Taylor expansion policy optimization Y Tang, M Valko, R Munos International Conference on Machine Learning, 9397-9406, 2020 | 19 | 2020 |
Generalized Preference Optimization: A Unified Approach to Offline Alignment Y Tang, ZD Guo, Z Zheng, D Calandriello, R Munos, M Rowland, ... arXiv preprint arXiv:2402.05749, 2024 | 15 | 2024 |