Fine-tuning giant neural networks on commodity hardware with automatic pipeline model parallelism S Eliad, I Hakimi, A De Jagger, M Silberstein, A Schuster 2021 USENIX Annual Technical Conference (USENIX ATC 21), 381-396, 2021 | 34 | 2021 |
Faster neural network training with approximate tensor operations M Adelman, K Levy, I Hakimi, M Silberstein Advances in Neural Information Processing Systems 34, 27877-27889, 2021 | 33 | 2021 |
Asynchronous distributed learning: Adapting to gradient delays without prior knowledge RZ Aviv, I Hakimi, A Schuster, KY Levy International Conference on Machine Learning, 436-445, 2021 | 27* | 2021 |
Taming momentum in a distributed asynchronous environment I Hakimi, S Barkai, M Gabel, A Schuster arXiv preprint arXiv:1907.11612, 2019 | 19 | 2019 |
Gap-Aware Mitigation of Gradient Staleness S Barkai, I Hakimi, A Schuster International Conference on Learning Representations (ICLR), 2020 | 17 | 2020 |
q2d: Turning questions into dialogs to teach models how to search Y Bitton, S Cohen-Ganor, I Hakimi, Y Lewenberg, R Aharoni, E Weinreb arXiv preprint arXiv:2304.14318, 2023 | 3 | 2023 |
Laga: Lagged allreduce with gradient accumulation for minimal idle time I Hakimi, RZ Aviv, KY Levy, A Schuster 2021 IEEE International Conference on Data Mining (ICDM), 171-180, 2021 | 3 | 2021 |
SMEGA2: Distributed Asynchronous Deep Neural Network Training With a Single Momentum Buffer R Cohen, I Hakimi, A Schuster Proceedings of the 51st International Conference on Parallel Processing, 1-10, 2022 | 1 | 2022 |
Gap-aware mitigation of gradient staleness A Schuster, S Barkai, I Hakimi US Patent 11,631,035, 2023 | | 2023 |
Distributed Deep Neural Networks I Hakimi, A Schuster Computer Science Department, Technion, 2022 | | 2022 |