- 学术资源搜索

In-network machine learning using programmable network devices: A survey

C Zheng, X Hong, D Ding, S Vargaftik… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org

Machine learning is widely used to solve networking challenges, ranging from traffic
classification and anomaly detection to network configuration. However, machine learning …

被引用次数：14 相关文章所有 4 个版本

[PDF] ieee.org

Hardware-assisted machine learning in resource-constrained IoT environments for security: review and future prospective

G Kornaros - IEEE Access, 2022 - ieeexplore.ieee.org

As the Internet of Things (IoT) technology advances, billions of multidisciplinary smart
devices act in concert, rarely requiring human intervention, posing significant challenges in …

被引用次数：43 相关文章所有 4 个版本

[PDF] arxiv.org

8-bit optimizers via block-wise quantization

T Dettmers, M Lewis, S Shleifer… - arXiv preprint arXiv …, 2021 - arxiv.org

Stateful optimizers maintain gradient statistics over time, eg, the exponentially smoothed
sum (SGD with momentum) or squared sum (Adam) of past gradient values. This state can …

被引用次数：169 相关文章所有 4 个版本

[PDF] usenix.org

Scaling distributed machine learning with {In-Network} aggregation

A Sapio, M Canini, CY Ho, J Nelson, P Kalnis… - … USENIX Symposium on …, 2021 - usenix.org

Training machine learning models in parallel is an increasingly important workload. We
accelerate distributed parallel training by designing a communication primitive that uses a …

被引用次数：417 相关文章所有 19 个版本

[PDF] acm.org

Floatpim: In-memory acceleration of deep neural network training with high precision

M Imani, S Gupta, Y Kim, T Rosing - Proceedings of the 46th International …, 2019 - dl.acm.org

Processing In-Memory (PIM) has shown a great potential to accelerate inference tasks of
Convolutional Neural Network (CNN). However, existing PIM architectures do not support …

被引用次数：260 相关文章所有 12 个版本

[PDF] neurips.cc

Communication compression for decentralized training

H Tang, S Gan, C Zhang, T Zhang… - Advances in Neural …, 2018 - proceedings.neurips.cc

Optimizing distributed learning systems is an art of balancing between computation and
communication. There have been two lines of research that try to deal with slower …

被引用次数：283 相关文章所有 11 个版本

[PDF] neurips.cc

Pushing the limits of narrow precision inferencing at cloud scale with microsoft floating point

B Darvish Rouhani, D Lo, R Zhao… - Advances in neural …, 2020 - proceedings.neurips.cc

In this paper, we explore the limits of Microsoft Floating Point (MSFP), a new class of
datatypes developed for production cloud-scale inferencing on custom hardware. Through …

被引用次数：110 相关文章所有 4 个版本

[PDF] neurips.cc

Stable and low-precision training for large-scale vision-language models

M Wortsman, T Dettmers… - Advances in …, 2023 - proceedings.neurips.cc

We introduce new methods for 1) accelerating and 2) stabilizing training for large language-
vision models. 1) For acceleration, we introduce SwitchBack, a linear layer for int8 quantized …

被引用次数：21 相关文章所有 6 个版本

Dynamic aggregation for heterogeneous quantization in federated learning

S Chen, C Shen, L Zhang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Communication is widely known as the primary bottleneck of federated learning, and
quantization of local model updates before uploading to the parameter server is an effective …

被引用次数：72 相关文章所有 2 个版本

[PDF] arxiv.org

Tensordash: Exploiting sparsity to accelerate deep neural network training

M Mahmoud, I Edo, AH Zadeh… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org

TensorDash is a hardware-based technique that enables data-parallel MAC units to take
advantage of sparsity in their input operand streams. When used to compose a hardware …

被引用次数：78 相关文章所有 8 个版本

高级搜索

QQ 群

In-network machine learning using programmable network devices: A survey

Hardware-assisted machine learning in resource-constrained IoT environments for security: review and future prospective

8-bit optimizers via block-wise quantization

Scaling distributed machine learning with {In-Network} aggregation

Floatpim: In-memory acceleration of deep neural network training with high precision

Communication compression for decentralized training

Pushing the limits of narrow precision inferencing at cloud scale with microsoft floating point

Stable and low-precision training for large-scale vision-language models

Dynamic aggregation for heterogeneous quantization in federated learning

Tensordash: Exploiting sparsity to accelerate deep neural network training

引用