The deep learning compiler: A comprehensive survey M Li, Y Liu, X Liu, Q Sun, X You, H Yang, Z Luan, L Gan, G Yang, D Qian IEEE Transactions on Parallel and Distributed Systems 32 (3), 708-727, 2020 | 205 | 2020 |
Automatic code generation and optimization of large-scale stencil computation on many-core processors M Li, Y Liu, H Yang, Y Hu, Q Sun, B Chen, X You, X Liu, Z Luan, D Qian Proceedings of the 50th International Conference on Parallel Processing, 1-12, 2021 | 15 | 2021 |
Sptfs: Sparse tensor format selection for mttkrp via deep learning Q Sun, Y Liu, M Dun, H Yang, Z Luan, L Gan, G Yang, D Qian SC20: International Conference for High Performance Computing, Networking …, 2020 | 15 | 2020 |
Highly scalable parallel genetic algorithm on sunway many-core processors Z Xiao, X Liu, J Xu, Q Sun, L Gan Future Generation Computer Systems 114, 679-691, 2021 | 13 | 2021 |
Smqos: Improving utilization and energy efficiency with qos awareness on gpus Q Sun, Y Liu, H Yang, Z Luan, D Qian 2019 IEEE International Conference on Cluster Computing (CLUSTER), 1-5, 2019 | 12 | 2019 |
Cognn: efficient scheduling for concurrent gnn training on gpus Q Sun, Y Liu, H Yang, R Zhang, M Dun, M Li, X Liu, W Xiao, Y Li, Z Luan, ... SC22: International Conference for High Performance Computing, Networking …, 2022 | 11 | 2022 |
Input-aware sparse tensor storage format selection for optimizing MTTKRP Q Sun, Y Liu, H Yang, M Dun, Z Luan, L Gan, G Yang, D Qian IEEE Transactions on Computers 71 (8), 1968-1981, 2021 | 10 | 2021 |
cstuner: Scalable auto-tuning framework for complex stencil computation on gpus Q Sun, Y Liu, H Yang, Z Jiang, X Liu, M Dun, Z Luan, D Qian 2021 IEEE International Conference on Cluster Computing (CLUSTER), 192-203, 2021 | 8 | 2021 |
Improving thread-level parallelism in GPUs through expanding register file to scratchpad memory C Yu, Y Bai, Q Sun, H Yang ACM Transactions on Architecture and Code Optimization (TACO) 15 (4), 1-24, 2018 | 7 | 2018 |
Mimose: An input-aware checkpointing planner for efficient training on GPU J Liao, M Li, Q Sun, J Hao, F Yu, S Chen, Y Tao, Z Zhang, H Yang, Z Luan, ... arXiv preprint arXiv:2209.02478, 2022 | 4 | 2022 |
Stencilmart: Predicting optimization selection for stencil computations across gpus Q Sun, Y Liu, H Yang, Z Jiang, Z Luan, D Qian 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2022 | 4 | 2022 |
Towards efficient canonical polyadic decomposition on sunway many-core processor M Dun, Y Li, Q Sun, H Yang, W Li, Z Luan, L Gan, G Yang, D Qian Information Sciences 549, 221-248, 2021 | 4 | 2021 |
QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU Q Sun, L Yi, H Yang, M Li, Z Luan, D Qian Parallel Computing 113, 102958, 2022 | 3 | 2022 |
Accelerating De Novo Assembler WTDBG2 on Commodity Servers M Dun, Y Li, X You, Q Sun, Z Luan, H Yang International Conference on Algorithms and Architectures for Parallel …, 2020 | 2 | 2020 |
Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUs Q Sun, Y Liu, H Yang, Z Jiang, Z Luan, D Qian IEEE Transactions on Parallel and Distributed Systems, 2023 | 1 | 2023 |
An optimized tensor completion library for multiple GPUs M Dun, Y Li, H Yang, Q Sun, Z Luan, D Qian Proceedings of the ACM International Conference on Supercomputing, 417-430, 2021 | 1 | 2021 |
Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU J Liao, M Li, H Yang, Q Sun, B Sun, J Hao, T Feng, F Yu, S Chen, Y Tao, ... 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2023 | | 2023 |
Towards Optimized Streaming Tensor Completion on multiple GPUs J Hao, H Yang, Q Sun, H Zhang, Z Luan, D Qian 2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th …, 2022 | | 2022 |