Incbricks: Toward in-network computation with an in-network cache M Liu, L Luo, J Nelson, L Ceze, A Krishnamurthy, K Atreya Proceedings of the Twenty-Second International Conference on Architectural …, 2017 | 170 | 2017 |
Software-hardware co-design for fast and scalable training of deep learning recommendation models D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch, S Sridharan, X Liu, ... Proceedings of the 49th Annual International Symposium on Computer …, 2022 | 151* | 2022 |
Parameter hub: a rack-scale parameter server for distributed deep neural network training L Luo, J Nelson, L Ceze, A Phanishayee, A Krishnamurthy Proceedings of the ACM Symposium on Cloud Computing, 41-54, 2018 | 141 | 2018 |
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. CoRR abs/2304.11277 (2023) Y Zhao, A Gu, R Varma, L Luo, CC Huang, M Xu, L Wright, H Shojanazeri, ... | 117* | 2023 |
PLink: Discovering and Exploiting Locality for Accelerated Distributed Training on the public Cloud. L Luo, P West, J Nelson, A Krishnamurthy, L Ceze Proceedings of the 3rd MLSys Conference, 2020, 2020 | 72* | 2020 |
Laser: Light, accurate sharing detection and repair L Luo, A Sriraman, B Fugate, S Hu, G Pokam, CJ Newburn, J Devietti 2016 IEEE International Symposium on High Performance Computer Architecture …, 2016 | 39 | 2016 |
Troubleshooting {Transiently-Recurring} Errors in Production Systems with {Blame-Proportional} Logging L Luo, S Nath, LR Sivalingam, M Musuvathi, L Ceze 2018 USENIX Annual Technical Conference (USENIX ATC 18), 321-334, 2018 | 22 | 2018 |
Motivating in-network aggregation for distributed deep neural network training L Luo, M Liu, J Nelson, L Ceze, A Phanishayee, A Krishnamurthy Workshop on Approximate Computing Across the Stack, 2017 | 17 | 2017 |
Parameter box: High performance parameter servers for efficient distributed deep neural network training L Luo, J Nelson, L Ceze, A Phanishayee, A Krishnamurthy MLSys 2018, 2018 | 14 | 2018 |
DHEN: A deep and hierarchical ensemble network for large-scale click-through rate prediction B Zhang, L Luo, X Liu, J Li, Z Chen, W Zhang, X Wei, Y Hao, M Tsang, ... arXiv preprint arXiv:2203.11014, 2022 | 12 | 2022 |
Srifty: Swift and thrifty distributed neural network training on the cloud L Luo, P West, P Patel, A Krishnamurthy, L Ceze Proceedings of Machine Learning and Systems 4, 833-847, 2022 | 7 | 2022 |
{NetHint}:{White-Box} networking for {Multi-Tenant} data centers J Chen, H Zhang, W Zhang, L Luo, J Chase, I Stoica, D Zhuo 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2022 | 7 | 2022 |
Pre-train and search: Efficient embedding table sharding with pre-trained neural cost models D Zha, L Feng, L Luo, B Bhushanam, Z Liu, Y Hu, J Nie, Y Huang, Y Tian, ... Proceedings of Machine Learning and Systems 5, 68-88, 2023 | 5 | 2023 |
Accelerating spmm kernel with cache-first edge sampling for graph neural networks CY Lin, L Luo, L Ceze arXiv preprint arXiv:2104.10716, 2021 | 3 | 2021 |
P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs H Huang, Y Li, J Sun, X Zhu, J Zhang, L Luo, J Li, Z Wang IEEE Transactions on Parallel and Distributed Systems 34 (8), 2311-2324, 2023 | 2 | 2023 |
Cloud collectives: Towards cloud-aware collectives forml workloads with rank reordering L Luo, J Nelson, A Krishnamurthy, L Ceze arXiv preprint arXiv:2105.14088, 2021 | 2 | 2021 |
Wukong: Towards a Scaling Law for Large-Scale Recommendation B Zhang, L Luo, Y Chen, J Nie, X Liu, D Guo, Y Zhao, S Li, Y Hao, Y Yao, ... arXiv preprint arXiv:2403.02545, 2024 | 1 | 2024 |
Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large Scale Recommendation L Luo, B Zhang, M Tsang, Y Ma, CH Chu, Y Chen, S Li, Y Hao, Y Zhao, ... Proceedings of Machine Learning and Systems 6, 266-278, 2024 | | 2024 |
Characterizing and Taming Resolution in Convolutional Neural Networks E Yan, L Luo, L Ceze 2021 IEEE International Symposium on Workload Characterization (IISWC), 189-200, 2021 | | 2021 |
Towards More Efficient Communication for Distributed Learning Systems L Luo University of Washington, 2020 | | 2020 |