Accelerating sparse dnn models without hardware-support via tile-wise sparsity C Guo, BY Hsueh, J Leng, Y Qiu, Y Guan, Z Wang, X Jia, X Li, M Guo, ... Proceedings of the International Conference for High Performance Computing …, 2020 | 86 | 2020 |
Dual-side sparse tensor core Y Wang, C Zhang, Z Xie, C Guo, Y Liu, J Leng 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture …, 2021 | 68 | 2021 |
Squant: On-the-fly data-free quantization via diagonal hessian approximation C Guo, Y Qiu, J Leng, X Gao, C Zhang, Y Liu, F Yang, Y Zhu, M Guo arXiv preprint arXiv:2202.07471, 2022 | 61 | 2022 |
Adversarial defense through network profiling based path extraction Y Qiu, J Leng, C Guo, Q Chen, C Li, M Guo, Y Zhu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019 | 59 | 2019 |
Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization C Guo, J Tang, W Hu, J Leng, C Zhang, F Yang, Y Liu, M Guo, Y Zhu Proceedings of the 50th Annual International Symposium on Computer …, 2023 | 44 | 2023 |
Characterizing and demystifying the implicit convolution algorithm on commercial matrix-multiplication accelerators Y Zhou, M Yang, C Guo, J Leng, Y Liang, Q Chen, M Guo, Y Zhu 2021 IEEE International Symposium on Workload Characterization (IISWC), 214-225, 2021 | 34 | 2021 |
Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization C Guo, C Zhang, J Leng, Z Liu, F Yang, Y Liu, M Guo, Y Zhu 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO …, 2022 | 28 | 2022 |
Balancing efficiency and flexibility for DNN acceleration via temporal GPU-systolic array integration C Guo, Y Zhou, J Leng, Y Zhu, Z Du, Q Chen, C Li, B Yao, M Guo 2020 57th ACM/IEEE Design Automation Conference (DAC), 1-6, 2020 | 28 | 2020 |
Efficient Adaptive Activation Rounding for Post-Training Quantization Z Li, C Guo, Z Zhu, Y Zhou, Y Qiu, X Gao, J Leng, M Guo arXiv preprint arXiv:2208.11945, 2022 | 8 | 2022 |
Nesting forward automatic differentiation for memory-efficient deep neural network training C Guo, Y Qiu, J Leng, C Zhang, Y Cao, Q Zhang, Y Liu, F Yang, M Guo 2022 IEEE 40th International Conference on Computer Design (ICCD), 738-745, 2022 | 7 | 2022 |
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching C Guo, R Zhang, J Xu, J Leng, Z Liu, Z Huang, M Guo, H Wu, S Zhao, ... Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 5 | 2024 |
JUNO: Optimizing High-Dimensional Approximate Nearest Neighbour Search with Sparsity-Aware Algorithm and Ray-Tracing Core Mapping Z Liu, W Ni, J Leng, Y Feng, C Guo, Q Chen, C Li, M Guo, Y Zhu Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 3 | 2024 |
Accelerating sparse dnns based on tiled gemm C Guo, F Xue, J Leng, Y Qiu, Y Guan, W Cui, Q Chen, M Guo IEEE Transactions on Computers, 2024 | 3 | 2024 |
Towards reliable AI applications via algorithm-based fault tolerance on NVDLA MT Sanic, C Guo, J Leng, M Guo, W Ma 2022 18th International Conference on Mobility, Sensing and Networking (MSN …, 2022 | 1 | 2022 |
AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs Y Zhou, Y Song, J Leng, Z Liu, W Cui, Z Zhang, C Guo, Q Chen, L Li, ... Proceedings of the 20th ACM International Conference on Computing Frontiers …, 2023 | | 2023 |