PET: Optimizing tensor programs with partially equivalent transformations and automated corrections H Wang, J Zhai, M Gao, Z Ma, S Tang, L Zheng, Y Li, K Rong, Y Chen, ... 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2021 | 60 | 2021 |
BaGuaLu: targeting brain scale pretrained models with over 37 million cores Z Ma, J He, J Qiu, H Cao, Y Wang, Z Sun, L Zheng, H Wang, S Tang, ... Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022 | 42 | 2022 |
FreeTensor: a free-form DSL with holistic optimizations for irregular tensor programs S Tang, J Zhai, H Wang, L Jiang, L Zheng, Z Yuan, C Zhang Proceedings of the 43rd ACM SIGPLAN International Conference on Programming …, 2022 | 9 | 2022 |
Vapro: Performance variance detection and diagnosis for production-run parallel applications L Zheng, J Zhai, X Tang, H Wang, T Yu, Y Jin, SL Song, W Chen Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022 | 5 | 2022 |
EINNET: Optimizing Tensor Programs with Derivation-Based Transformations L Zheng, H Wang, J Zhai, M Hu, Z Ma, T Wang, S Huang, X Miao, S Tang, ... 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023 | 4 | 2023 |
OLLIE: Derivation-based tensor program optimizer L Zheng, H Wang, J Zhai, M Hu, Z Ma, T Wang, S Tang, L Xie, K Huang, ... arXiv preprint arXiv:2208.02025, 2022 | 2 | 2022 |
Critique of “Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility” by SCC Team From Tsinghua University C Zhang, C Zhao, J He, S Chen, L Zheng, K Huang, W Han, J Zhai IEEE Transactions on Parallel and Distributed Systems 32 (11), 2631-2634, 2021 | 2 | 2021 |
Optimizing dnns with partially equivalent transformations and automated corrections H Wang, J Zhai, M Gao, F Zhang, T Wang, Z Ma, S Tang, L Zheng, ... IEEE Transactions on Computers, 2023 | 1 | 2023 |
PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR Z Ma, H Wang, J Xing, L Zheng, C Zhang, H Cao, K Huang, S Tang, ... arXiv preprint arXiv:2307.04995, 2023 | 1 | 2023 |
Detecting performance variance for parallel applications without source code J Zhai, L Zheng, F Zhang, X Tang, H Wang, T Yu, Y Jin, SL Song, W Chen IEEE Transactions on Parallel and Distributed Systems 33 (12), 4239-4255, 2022 | 1 | 2022 |
Leveraging code snippets to detect variations in the performance of HPC systems J Zhai, L Zheng, J Sun, F Zhang, X Tang, X Qian, B He, W Xue, W Chen, ... IEEE Transactions on Parallel and Distributed Systems 33 (12), 3558-3574, 2022 | 1 | 2022 |
Optimal Kernel Orchestration for Tensor Programs with Korch M Hu, A Venkatram, S Biswas, B Marimuthu, B Hou, G Oliaro, H Wang, ... Proceedings of the 29th ACM International Conference on Architectural …, 2024 | | 2024 |
WiseGraph: Optimizing GNN with Joint Workload Partition of Graph and Operations K Huang, J Zhai, L Zheng, H Wang, Y Jin, Q Zhang, R Zhang, Z Zheng, ... Proceedings of the Nineteenth European Conference on Computer Systems, 1-17, 2024 | | 2024 |
Student Cluster Competition 2018, Team Tsinghua University: Reproducing performance of multi-physics simulations of the Tsunamigenic 2004 Sumatra megathrust earthquake on the … J He, C Zhao, J Yu, X Yu, L Zheng, C Lou, S Tang, W Han, J Zhai Parallel Computing 90, 102570, 2019 | | 2019 |