Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023 - dl.acm.org
Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

Squant: On-the-fly data-free quantization via diagonal hessian approximation

C Guo, Y Qiu, J Leng, X Gao, C Zhang, Y Liu… - arXiv preprint arXiv …, 2022 - arxiv.org
Quantization of deep neural networks (DNN) has been proven effective for compressing and
accelerating DNN models. Data-free quantization (DFQ) is a promising approach without the …

A survey for efficient open domain question answering

Q Zhang, S Chen, D Xu, Q Cao, X Chen, T Cohn… - arXiv preprint arXiv …, 2022 - arxiv.org
Open domain question answering (ODQA) is a longstanding task aimed at answering factual
questions from a large knowledge corpus without any explicit evidence in natural language …

Transkimmer: Transformer learns to layer-wise skim

Y Guan, Z Li, J Leng, Z Lin, M Guo - arXiv preprint arXiv:2205.07324, 2022 - arxiv.org
Transformer architecture has become the de-facto model for many machine learning tasks
from natural language processing and computer vision. As such, improving its computational …

Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization

C Guo, C Zhang, J Leng, Z Liu, F Yang… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Quantization is a technique to reduce the computation and memory cost of DNN models,
which are getting increasingly large. Existing quantization solutions use fixed-point integer …

On the effectiveness of pre-trained language models for legal natural language processing: An empirical study

D Song, S Gao, B He, F Schilder - IEEE Access, 2022 - ieeexplore.ieee.org
We present the first comprehensive empirical evaluation of pre-trained language models
(PLMs) for legal natural language processing (NLP) in order to examine their effectiveness …

Chatbots as problem solvers: Playing twenty questions with role reversals

D Noever, F McKee - arXiv preprint arXiv:2301.01743, 2023 - arxiv.org
New chat AI applications like ChatGPT offer an advanced understanding of question context
and memory across multi-step tasks, such that experiments can test its deductive reasoning …

DC-Graph: a chunk optimization model based on document classification and graph learning

J Zhou, G Zhang, O Alfarraj, A Tolba, X Li… - Artificial Intelligence …, 2024 - Springer
Existing machine reading comprehension methods use a fixed stride to chunk long texts,
which leads to missing contextual information at the boundaries of the chunks and a lack of …

PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences

S Zhang, W Cui, Q Chen, Z Zhang, Y Guan… - Proceedings of the 36th …, 2022 - dl.acm.org
In emerging DNN serving systems, queries are usually batched to fully leverage hardware
resources, and all the queries in a batch run through the complete model and return at the …