QUIC-FL: Quick Unbiased Compression for Federated Learning

RB Basat, S Vargaftik, A Portnoy, G Einziger… - arXiv preprint arXiv …, 2022 - arxiv.org
Distributed Mean Estimation (DME), in which $ n $ clients communicate vectors to a
parameter server that estimates their average, is a fundamental building block in …

Accelerating Distributed Deep Learning using Lossless Homomorphic Compression

H Li, Y Xu, J Chen, R Dwivedula, W Wu, K He… - arXiv preprint arXiv …, 2024 - arxiv.org
As deep neural networks (DNNs) grow in complexity and size, the resultant increase in
communication overhead during distributed training has become a significant bottleneck …

Optimal and Near-Optimal Adaptive Vector Quantization

R Ben-Basat, Y Ben-Itzhak, M Mitzenmacher… - arXiv preprint arXiv …, 2024 - arxiv.org
Quantization is a fundamental optimization for many machine-learning use cases, including
compressing gradients, model weights and activations, and datasets. The most accurate …

Beyond Throughput and Compression Ratios: Towards High End-to-end Utility of Gradient Compression

W Han, S Vargaftik, M Mitzenmacher, B Karp… - arXiv preprint arXiv …, 2024 - arxiv.org
Gradient aggregation has long been identified as a major bottleneck in today's large-scale
distributed machine learning training systems. One promising solution to mitigate such …

Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference

Z Zhang, H Shen - arXiv preprint arXiv:2408.04107, 2024 - arxiv.org
In large-language models, memory constraints in the key-value cache (KVC) pose a
challenge during inference, especially with long prompts. In this work, we observed that …

Accelerating Federated Learning with Quick Distributed Mean Estimation

R Ben-Basat, A Portnoy, G Einziger… - Forty-first International … - openreview.net
Distributed Mean Estimation (DME), in which $ n $ clients communicate vectors to a
parameter server that estimates their average, is a fundamental building block in …

Telemetry for Next-Generation Networks

J Langlet - 2024 - qmro.qmul.ac.uk
Software-defined networking enables tight integration between packet-processing hardware
and centralized controllers, highlighting the importance of deep network insight for informed …

Approximate Computing and In-Memory Computing: The Best of the Two Worlds!

MEF Essa - 2024 - search.proquest.com
Abstract Machine learning (ML) has become ubiquitous, integrating into numerous real-life
applications. However, meeting the computational demands of ML systems is challenging …