Deja vu: Contextual sparsity for efficient llms at inference time

Z Liu, J Wang, T Dao, T Zhou, B Yuan… - International …, 2023 - proceedings.mlr.press
Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …

Image super-resolution with non-local sparse attention

Y Mei, Y Fan, Y Zhou - … of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com
Both non-local (NL) operation and sparse representation are crucial for Single Image Super-
Resolution (SISR). In this paper, we investigate their combinations and propose a novel Non …

Reformer: The efficient transformer

N Kitaev, Ł Kaiser, A Levskaya - arXiv preprint arXiv:2001.04451, 2020 - arxiv.org
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but
training these models can be prohibitively costly, especially on long sequences. We …

The limitations of federated learning in sybil settings

C Fung, CJM Yoon, I Beschastnikh - 23rd International Symposium on …, 2020 - usenix.org
Federated learning over distributed multi-party data is an emerging paradigm that iteratively
aggregates updates from a group of devices to train a globally shared model. Relying on a …

ETC: Encoding long and structured inputs in transformers

J Ainslie, S Ontanon, C Alberti, V Cvicek… - arXiv preprint arXiv …, 2020 - arxiv.org
Transformer models have advanced the state of the art in many Natural Language
Processing (NLP) tasks. In this paper, we present a new Transformer architecture, Extended …

Accelerating large-scale inference with anisotropic vector quantization

R Guo, P Sun, E Lindgren, Q Geng… - International …, 2020 - proceedings.mlr.press
Quantization based techniques are the current state-of-the-art for scaling maximum inner
product search to massive databases. Traditional approaches to quantization aim to …

Mitigating sybils in federated learning poisoning

C Fung, CJM Yoon, I Beschastnikh - arXiv preprint arXiv:1808.04866, 2018 - arxiv.org
Machine learning (ML) over distributed multi-party data is required for a variety of domains.
Existing approaches, such as federated learning, collect the outputs computed by a group of …

Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning

N Papernot, P McDaniel - arXiv preprint arXiv:1803.04765, 2018 - arxiv.org
Deep neural networks (DNNs) enable innovative applications of machine learning like
image recognition, machine translation, or malware detection. However, deep learning is …

Scatterbrain: Unifying sparse and low-rank attention

B Chen, T Dao, E Winsor, Z Song… - Advances in Neural …, 2021 - proceedings.neurips.cc
Recent advances in efficient Transformers have exploited either the sparsity or low-rank
properties of attention matrices to reduce the computational and memory bottlenecks of …

Survey of vector database management systems

JJ Pan, J Wang, G Li - The VLDB Journal, 2024 - Springer
There are now over 20 commercial vector database management systems (VDBMSs), all
produced within the past five years. But embedding-based retrieval has been studied for …