AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration J Lin, J Tang, H Tang, S Yang, WM Chen, WC Wang, G Xiao, X Dang, ... MLSys 2024, Best Paper Award, 2023 | 223 | 2023 |
Flatformer: Flattened window attention for efficient point cloud transformer Z Liu, X Yang, H Tang, S Yang, S Han Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 40 | 2023 |
Torchsparse++: Efficient training and inference framework for sparse convolution on gpus H Tang, S Yang, Z Liu, K Hong, Z Yu, X Li, G Dai, Y Wang, S Han Proceedings of the 56th Annual IEEE/ACM International Symposium on …, 2023 | 16* | 2023 |
Heuristic adaptability to input dynamics for spmm on gpus G Dai, G Huang, S Yang, Z Yu, H Zhang, Y Ding, Y Xie, H Yang, Y Wang Proceedings of the 59th ACM/IEEE Design Automation Conference, 595-600, 2022 | 11 | 2022 |
Hypergef: A framework enabling efficient fusion for hypergraph neural network on gpus Z Yu, G Dai, S Yang, G Zhang, H Zhang, F Zhu, J Yang, J Zhao, Y Wang Proceedings of Machine Learning and Systems 5, 387-399, 2023 | 3 | 2023 |
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving Y Lin, H Tang, S Yang, Z Zhang, G Xiao, C Gan, S Han arXiv preprint arXiv:2405.04532, 2024 | 2 | 2024 |
Sparse Refinement for Efficient High-Resolution Semantic Segmentation Z Liu, Z Zhang, S Yang, H Tang, C Xu, K Keutzer, S Han | | 2023 |
CLAP: Locality Aware and Parallel Triangle Counting with Content Addressable Memory T Fu, C Wei, Z Zhu, S Yang, Z Yu, G Dai, H Yang, Y Wang 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), 1-6, 2023 | | 2023 |