Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning T Zadouri, A Üstün, A Ahmadian, B Ermiş, A Locatelli, S Hooker arXiv preprint arXiv:2309.05444, 2023 | 52 | 2023 |
Aya 23: Open weight releases to further multilingual progress V Aryabumi, J Dang, D Talupuru, S Dash, D Cairuz, H Lin, B Venkitesh, ... arXiv preprint arXiv:2405.15032, 2024 | 17 | 2024 |
Exploring low rank training of deep neural networks SR Kamalakara, A Locatelli, B Venkitesh, J Ba, Y Gal, AN Gomez arXiv preprint arXiv:2209.13569, 2022 | 13 | 2022 |
Snapkv: Llm knows what you are looking for before generation Y Li, Y Huang, B Yang, B Venkitesh, A Locatelli, H Ye, T Cai, P Lewis, ... arXiv preprint arXiv:2404.14469, 2024 | 12 | 2024 |
Regular cylindrical algebraic decomposition JH Davenport, AF Locatelli, GK Sankaran Journal of the London Mathematical Society 101 (1), 43-59, 2020 | 4 | 2020 |
On the regularity of cylindrical algebraic decompositions A Locatelli University of Bath, 2015 | 2 | 2015 |
To Code, or Not To Code? Exploring Impact of Code in Pre-training V Aryabumi, Y Su, R Ma, A Morisot, I Zhang, A Locatelli, M Fadaee, ... arXiv preprint arXiv:2408.10914, 2024 | | 2024 |
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts Q Zhang, N Gritsch, D Gnaneshwar, S Guo, D Cairuz, B Venkitesh, ... arXiv preprint arXiv:2408.08274, 2024 | | 2024 |
System and Method for Low Rank Training of Neural Networks SR Kamalakara, B Venkitesh, AN Gomez, AFN Locatelli US Patent App. 17/814,041, 2023 | | 2023 |