关注
Haojun Xia
Haojun Xia
在 uni.sydney.edu.au 的电子邮件经过验证 - 首页
标题
引用次数
引用次数
年份
Flash-llm: Enabling cost-effective and highly-efficient large generative model inference with unstructured sparsity
H Xia, Z Zheng, Y Li, D Zhuang, Z Zhou, X Qiu, Y Li, W Lin, SL Song
arXiv preprint arXiv:2309.10285, 2023
182023
η-lstm: Co-designing highly-efficient large lstm training via exploiting memory-saving and architectural design opportunities
X Zhang, H Xia, D Zhuang, H Sun, X Fu, MB Taylor, SL Song
2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture …, 2021
172021
Shift-BNN: Highly-efficient probabilistic Bayesian neural network training via memory-friendly pattern retrieving
Q Wan, H Xia, X Zhang, L Wang, SL Song, X Fu
MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture …, 2021
82021
Fp6-llm: Efficiently serving large language models through fp6-centric algorithm-system co-design
H Xia, Z Zheng, X Wu, S Chen, Z Yao, S Youn, A Bakhtiari, M Wyatt, ...
arXiv preprint arXiv:2401.14112, 2024
42024
Zeroquant (4+ 2): Redefining llms quantization with a new fp6-centric strategy for diverse generative tasks
X Wu, H Xia, S Youn, Z Zheng, S Chen, A Bakhtiari, M Wyatt, Y He, ...
arXiv preprint arXiv:2312.08583, 2023
42023
Enabling fast and memory-efficient acceleration for pattern matching workloads: The lightweight automata processing engine
L Gong, C Wang, H Xia, X Chen, X Li, X Zhou
IEEE Transactions on Computers 72 (4), 1011-1025, 2022
42022
Lap: A lightweight automata processor for pattern matching tasks
H Xia, L Gong, C Wang, X Chen, X Zhou
2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), 844-849, 2021
32021
{Quant-LLM}: Accelerating the Serving of Large Language Models via {FP6-Centric}{Algorithm-System}{Co-Design} on Modern {GPUs}
H Xia, Z Zheng, X Wu, S Chen, Z Yao, S Youn, A Bakhtiari, M Wyatt, ...
2024 USENIX Annual Technical Conference (USENIX ATC 24), 699-713, 2024
2024
{MonoNN}: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern {GPU-Centric} Architectures
D Zhuang, Z Zheng, H Xia, X Qiu, J Bai, W Lin, SL Song
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024
2024
MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric Architectures
D Zhuang, Z Zheng, H Xia, X Qiu, J Bai, W Lin
系统目前无法执行此操作,请稍后再试。
文章 1–10