关注
Qingxiao Sun
Qingxiao Sun
在 cup.edu.cn 的电子邮件经过验证
标题
引用次数
引用次数
年份
The deep learning compiler: A comprehensive survey
M Li, Y Liu, X Liu, Q Sun, X You, H Yang, Z Luan, L Gan, G Yang, D Qian
IEEE Transactions on Parallel and Distributed Systems 32 (3), 708-727, 2020
2052020
Automatic code generation and optimization of large-scale stencil computation on many-core processors
M Li, Y Liu, H Yang, Y Hu, Q Sun, B Chen, X You, X Liu, Z Luan, D Qian
Proceedings of the 50th International Conference on Parallel Processing, 1-12, 2021
152021
Sptfs: Sparse tensor format selection for mttkrp via deep learning
Q Sun, Y Liu, M Dun, H Yang, Z Luan, L Gan, G Yang, D Qian
SC20: International Conference for High Performance Computing, Networking …, 2020
152020
Highly scalable parallel genetic algorithm on sunway many-core processors
Z Xiao, X Liu, J Xu, Q Sun, L Gan
Future Generation Computer Systems 114, 679-691, 2021
132021
Smqos: Improving utilization and energy efficiency with qos awareness on gpus
Q Sun, Y Liu, H Yang, Z Luan, D Qian
2019 IEEE International Conference on Cluster Computing (CLUSTER), 1-5, 2019
122019
Cognn: efficient scheduling for concurrent gnn training on gpus
Q Sun, Y Liu, H Yang, R Zhang, M Dun, M Li, X Liu, W Xiao, Y Li, Z Luan, ...
SC22: International Conference for High Performance Computing, Networking …, 2022
112022
Input-aware sparse tensor storage format selection for optimizing MTTKRP
Q Sun, Y Liu, H Yang, M Dun, Z Luan, L Gan, G Yang, D Qian
IEEE Transactions on Computers 71 (8), 1968-1981, 2021
102021
cstuner: Scalable auto-tuning framework for complex stencil computation on gpus
Q Sun, Y Liu, H Yang, Z Jiang, X Liu, M Dun, Z Luan, D Qian
2021 IEEE International Conference on Cluster Computing (CLUSTER), 192-203, 2021
82021
Improving thread-level parallelism in GPUs through expanding register file to scratchpad memory
C Yu, Y Bai, Q Sun, H Yang
ACM Transactions on Architecture and Code Optimization (TACO) 15 (4), 1-24, 2018
72018
Mimose: An input-aware checkpointing planner for efficient training on GPU
J Liao, M Li, Q Sun, J Hao, F Yu, S Chen, Y Tao, Z Zhang, H Yang, Z Luan, ...
arXiv preprint arXiv:2209.02478, 2022
42022
Stencilmart: Predicting optimization selection for stencil computations across gpus
Q Sun, Y Liu, H Yang, Z Jiang, Z Luan, D Qian
2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2022
42022
Towards efficient canonical polyadic decomposition on sunway many-core processor
M Dun, Y Li, Q Sun, H Yang, W Li, Z Luan, L Gan, G Yang, D Qian
Information Sciences 549, 221-248, 2021
42021
QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU
Q Sun, L Yi, H Yang, M Li, Z Luan, D Qian
Parallel Computing 113, 102958, 2022
32022
Accelerating De Novo Assembler WTDBG2 on Commodity Servers
M Dun, Y Li, X You, Q Sun, Z Luan, H Yang
International Conference on Algorithms and Architectures for Parallel …, 2020
22020
Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUs
Q Sun, Y Liu, H Yang, Z Jiang, Z Luan, D Qian
IEEE Transactions on Parallel and Distributed Systems, 2023
12023
An optimized tensor completion library for multiple GPUs
M Dun, Y Li, H Yang, Q Sun, Z Luan, D Qian
Proceedings of the ACM International Conference on Supercomputing, 417-430, 2021
12021
Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU
J Liao, M Li, H Yang, Q Sun, B Sun, J Hao, T Feng, F Yu, S Chen, Y Tao, ...
2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2023
2023
Towards Optimized Streaming Tensor Completion on multiple GPUs
J Hao, H Yang, Q Sun, H Zhang, Z Luan, D Qian
2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th …, 2022
2022
系统目前无法执行此操作,请稍后再试。
文章 1–18