A survey on efficient inference for large language models Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou, L Wang, Z Yuan, X Li, ... arXiv preprint arXiv:2404.14294, 2024 | 11 | 2024 |
Evaluating quantized large language models S Li, X Ning, L Wang, T Liu, X Shi, S Yan, G Dai, H Yang, Y Wang Forty-first International Conference on Machine Learning, 2024 | 9 | 2024 |
Lv-eval: A balanced long-context benchmark with 5 length levels up to 256k T Yuan, X Ning, D Zhou, Z Yang, S Li, M Zhuang, Z Tan, Z Yao, D Lin, B Li, ... arXiv preprint arXiv:2402.05136, 2024 | 8 | 2024 |
Flightllm: Efficient large language model inference with a complete mapping flow on fpgas S Zeng, J Liu, G Dai, X Yang, T Fu, H Wang, W Ma, H Sun, S Li, Z Huang, ... Proceedings of the 2024 ACM/SIGDA International Symposium on Field …, 2024 | 7 | 2024 |
LLM-MQ: Mixed-precision Quantization for Efficient LLM Deployment S Li, X Ning, K Hong, T Liu, L Wang, X Li, K Zhong, G Dai, H Yang, ... NeurIPS 2023 Efficient Natural Language and Speech Processing Workshop, 2023, 0 | 6* | |
A unified FPGA virtualization framework for general-purpose deep neural networks in the cloud S Zeng, G Dai, H Sun, J Liu, S Li, G Ge, K Zhong, K Guo, Y Wang, H Yang ACM Transactions on Reconfigurable Technology and Systems (TRETS) 15 (3), 1-31, 2021 | 4 | 2021 |
Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization J Li, S Li, J Xu, S Huang, Y Lian, J Liu, Y Wang, G Dai arXiv preprint arXiv:2311.16442, 2023 | 1 | 2023 |
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression T Fu, H Huang, X Ning, G Zhang, B Chen, T Wu, H Wang, Z Huang, S Li, ... arXiv preprint arXiv:2406.14909, 2024 | | 2024 |
Can LLMs Learn by Teaching? A Preliminary Study X Ning, Z Wang, S Li, Z Lin, P Yao, T Fu, MB Blaschko, G Dai, H Yang, ... arXiv preprint arXiv:2406.14629, 2024 | | 2024 |
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation T Zhao, T Fang, E Liu, W Rui, W Soedarmadji, S Li, Z Lin, G Dai, S Yan, ... arXiv preprint arXiv:2406.02540, 2024 | | 2024 |
Towards High-accuracy and Real-time Two-stage Small Object Detection on FPGA S Li, Z Zhu, H Sun, X Ning, G Dai, Y Hu, H Yang, Y Wang IEEE Transactions on Circuits and Systems for Video Technology, 2024 | | 2024 |
TCP: Triplet Contrastive-relationship Preserving for Class-Incremental Learning S Li, X Ning, S Zhang, L Guo, T Zhao, H Yang, Y Wang Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2024 | | 2024 |
Memory-Efficient and Real-Time SPAD-based dToF Depth Sensor with Spatial and Statistical Correlation S Li, Z Zhu, Y Zhu, Q Zhu, J Zhang, W Sun, G Dai, F Qiao, H Yang, ... 2023 60th ACM/IEEE Design Automation Conference (DAC), 1-6, 2023 | | 2023 |