关注
Shang Yang
标题
引用次数
引用次数
年份
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
J Lin, J Tang, H Tang, S Yang, WM Chen, WC Wang, G Xiao, X Dang, ...
MLSys 2024, Best Paper Award, 2023
2232023
Flatformer: Flattened window attention for efficient point cloud transformer
Z Liu, X Yang, H Tang, S Yang, S Han
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023
402023
Torchsparse++: Efficient training and inference framework for sparse convolution on gpus
H Tang, S Yang, Z Liu, K Hong, Z Yu, X Li, G Dai, Y Wang, S Han
Proceedings of the 56th Annual IEEE/ACM International Symposium on …, 2023
16*2023
Heuristic adaptability to input dynamics for spmm on gpus
G Dai, G Huang, S Yang, Z Yu, H Zhang, Y Ding, Y Xie, H Yang, Y Wang
Proceedings of the 59th ACM/IEEE Design Automation Conference, 595-600, 2022
112022
Hypergef: A framework enabling efficient fusion for hypergraph neural network on gpus
Z Yu, G Dai, S Yang, G Zhang, H Zhang, F Zhu, J Yang, J Zhao, Y Wang
Proceedings of Machine Learning and Systems 5, 387-399, 2023
32023
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Y Lin, H Tang, S Yang, Z Zhang, G Xiao, C Gan, S Han
arXiv preprint arXiv:2405.04532, 2024
22024
Sparse Refinement for Efficient High-Resolution Semantic Segmentation
Z Liu, Z Zhang, S Yang, H Tang, C Xu, K Keutzer, S Han
2023
CLAP: Locality Aware and Parallel Triangle Counting with Content Addressable Memory
T Fu, C Wei, Z Zhu, S Yang, Z Yu, G Dai, H Yang, Y Wang
2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), 1-6, 2023
2023
系统目前无法执行此操作,请稍后再试。
文章 1–8