{THC}: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression

RB Basat, S Vargaftik, A Portnoy, G Einziger… - arXiv preprint arXiv …, 2022 - arxiv.org

Distributed Mean Estimation (DME), in which $ n $ clients communicate vectors to a
parameter server that estimates their average, is a fundamental building block in …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Accelerating Distributed Deep Learning using Lossless Homomorphic Compression

H Li, Y Xu, J Chen, R Dwivedula, W Wu, K He… - arXiv preprint arXiv …, 2024 - arxiv.org

As deep neural networks (DNNs) grow in complexity and size, the resultant increase in
communication overhead during distributed training has become a significant bottleneck …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Optimal and Near-Optimal Adaptive Vector Quantization

R Ben-Basat, Y Ben-Itzhak, M Mitzenmacher… - arXiv preprint arXiv …, 2024 - arxiv.org

Quantization is a fundamental optimization for many machine-learning use cases, including
compressing gradients, model weights and activations, and datasets. The most accurate …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Beyond Throughput and Compression Ratios: Towards High End-to-end Utility of Gradient Compression

W Han, S Vargaftik, M Mitzenmacher, B Karp… - arXiv preprint arXiv …, 2024 - arxiv.org

Gradient aggregation has long been identified as a major bottleneck in today's large-scale
distributed machine learning training systems. One promising solution to mitigate such …

Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference

Z Zhang, H Shen - arXiv preprint arXiv:2408.04107, 2024 - arxiv.org

In large-language models, memory constraints in the key-value cache (KVC) pose a
challenge during inference, especially with long prompts. In this work, we observed that …

Accelerating Federated Learning with Quick Distributed Mean Estimation

R Ben-Basat, A Portnoy, G Einziger… - Forty-first International … - openreview.net

Distributed Mean Estimation (DME), in which $ n $ clients communicate vectors to a
parameter server that estimates their average, is a fundamental building block in …

被引用次数：1 相关文章所有 2 个版本

[PDF] qmul.ac.uk

Telemetry for Next-Generation Networks

J Langlet - 2024 - qmro.qmul.ac.uk

Software-defined networking enables tight integration between packet-processing hardware
and centralized controllers, highlighting the importance of deep network insight for informed …

Approximate Computing and In-Memory Computing: The Best of the Two Worlds!

MEF Essa - 2024 - search.proquest.com

Abstract Machine learning (ML) has become ubiquitous, integrating into numerous real-life
applications. However, meeting the computational demands of ML systems is challenging …

高级搜索

QQ 群