Break the Sequential Dependency of LLM Inference Using Lookahead Decoding Y Fu, P Bailis, I Stoica, H Zhang arXiv preprint arXiv:2402.02057, 2024 | 59* | 2024 |
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models H You, Y Fu, Z Wang, A Yazdanbakhsh arXiv preprint arXiv:2406.07368, 2024 | 2 | 2024 |
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization H You, Y Guo, Y Fu, W Zhou, H Shi, X Zhang, S Kundu, A Yazdanbakhsh, ... arXiv preprint arXiv:2406.05981, 2024 | 1 | 2024 |
Efficient LLM Scheduling by Learning to Rank Y Fu, S Zhu, R Su, A Qiao, I Stoica, H Zhang arXiv preprint arXiv:2408.15792, 2024 | | 2024 |
Neuron sensitivity guided test case selection D Huang, Q Bu, Y Fu, Y Qing, X Xie, J Chen, H Cui ACM Transactions on Software Engineering and Methodology, 1, 2024 | | 2024 |
AMPipe: Accelerating MoE Model Training with Intra-Block Pipelining Y Fu, Q Yuhao, S Zhao, F Li, B Xiao, D HUANG, H Cui | | |