{TVM}: An automated {End-to-End} optimizing compiler for deep learning T Chen, T Moreau, Z Jiang, L Zheng, E Yan, H Shen, M Cowan, L Wang, ... 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2018 | 1702 | 2018 |
TVM: end-to-end optimization stack for deep learning T Chen, T Moreau, Z Jiang, H Shen, EQ Yan, L Wang, Y Hu, L Ceze, ... arXiv preprint arXiv:1802.04799 11 (2018), 20, 2018 | 265 | 2018 |
Hawq-v3: Dyadic neural network quantization Z Yao, Z Dong, Z Zheng, A Gholami, J Yu, E Tan, L Wang, Q Huang, ... International Conference on Machine Learning, 11875-11886, 2021 | 220 | 2021 |
Gunrock: GPU graph analytics Y Wang, Y Pan, A Davidson, Y Wu, C Yang, L Wang, M Osama, C Yuan, ... ACM Transactions on Parallel Computing (TOPC) 4 (1), 1-49, 2017 | 154 | 2017 |
A comparative study on exact triangle counting algorithms on the GPU L Wang, Y Wang, C Yang, JD Owens Proceedings of the ACM Workshop on High Performance Graph Processing, 1-8, 2016 | 70 | 2016 |
A unified optimization approach for cnn model inference on integrated gpus L Wang, Z Chen, Y Liu, Y Wang, L Zheng, M Li, Y Wang Proceedings of the 48th International Conference on Parallel Processing, 1-10, 2019 | 39 | 2019 |
Bolt: Bridging the gap between auto-tuners and hardware-native performance J Xing, L Wang, S Zhang, J Chen, A Chen, Y Zhu Proceedings of Machine Learning and Systems 4, 204-216, 2022 | 34 | 2022 |
Bytetransformer: A high-performance transformer boosted for variable-length inputs Y Zhai, C Jiang, L Wang, X Jia, S Zhang, Z Chen, X Liu, Y Zhu 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2023 | 28 | 2023 |
UNIT: Unifying tensorized instruction compilation J Weng, A Jain, J Wang, L Wang, Y Wang, T Nowatzki 2021 IEEE/ACM International Symposium on Code Generation and Optimization …, 2021 | 28 | 2021 |
Fast parallel suffix array on the GPU L Wang, S Baxter, JD Owens European Conference on Parallel Processing, 573-587, 2015 | 22 | 2015 |
Fast parallel skew and prefix‐doubling suffix array construction on the GPU L Wang, S Baxter, JD Owens Concurrency and Computation: Practice and Experience 28 (12), 3466-3484, 2016 | 19 | 2016 |
Fast parallel subgraph matching on the gpu L Wang, Y Wang, JD Owens HPDC, 2016 | 18 | 2016 |
Fast gunrock subgraph matching (gsm) on gpus L Wang, JD Owens arXiv preprint arXiv:2003.01527, 2020 | 15 | 2020 |
Fast bfs-based triangle counting on gpus L Wang, JD Owens 2019 IEEE High Performance Extreme Computing Conference (HPEC), 1-6, 2019 | 9 | 2019 |
Optimal message scheduling for aggregation L Wang, M Li, E Liberty, A Smola | 6 | 2018 |