An overview of efficient interconnection networks for deep neural network accelerators

SM Nabavinejad, M Baharloo, KC Chen… - IEEE Journal on …, 2020 - ieeexplore.ieee.org
Deep Neural Networks (DNNs) have shown significant advantages in many domains, such
as pattern recognition, prediction, and control optimization. The edge computing demand in …

Pytorch distributed: Experiences on accelerating data parallel training

S Li, Y Zhao, R Varma, O Salpekar, P Noordhuis… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper presents the design, implementation, and evaluation of the PyTorch distributed
data parallel module. PyTorch is a widely-adopted scientific computing package used in …

Performance optimization of federated person re-identification via benchmark analysis

W Zhuang, Y Wen, X Zhang, X Gan, D Yin… - Proceedings of the 28th …, 2020 - dl.acm.org
Federated learning is a privacy-preserving machine learning technique that learns a shared
model across decentralized clients. It can alleviate privacy concerns of personal re …

A Survey on Scheduling Techniques in Computing and Network Convergence

S Tang, Y Yu, H Wang, G Wang, W Chen… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …

{FlashNeuron}:{SSD-Enabled}{Large-Batch} Training of Very Deep Neural Networks

J Bae, J Lee, Y Jin, S Son, S Kim, H Jang… - … USENIX Conference on …, 2021 - usenix.org
Deep neural networks (DNNs) are widely used in various AI application domains such as
computer vision, natural language processing, autonomous driving, and bioinformatics. As …

Optimizing performance of federated person re-identification: Benchmarking and analysis

W Zhuang, X Gan, Y Wen, S Zhang - ACM Transactions on Multimedia …, 2023 - dl.acm.org
Increasingly stringent data privacy regulations limit the development of person re-
identification (ReID) because person ReID training requires centralizing an enormous …

Astraea: A fair deep learning scheduler for multi-tenant gpu clusters

Z Ye, P Sun, W Gao, T Zhang, X Wang… - … on Parallel and …, 2021 - ieeexplore.ieee.org
Modern GPU clusters are designed to support distributed Deep Learning jobs from multiple
tenants concurrently. Each tenant may have varied and dynamic resource demands …

Superconducting hyperdimensional associative memory circuit for scalable machine learning

K Huch, P Gonzalez-Guerrero, D Lyles… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
We propose a generalized architecture for the first rapid-single-flux-quantum (RSFQ)
associative memory circuit. The circuit employs hyperdimensional computing (HDC), a …

Lossless medical image compression based on anatomical information and deep neural networks

Q Min, X Wang, B Huang, Z Zhou - Biomedical Signal Processing and …, 2022 - Elsevier
Modern imaging modalities generate large volumes of medical data that place a heavy
burden on both storage and transmission. Consequently, image data compression is a key …

[HTML][HTML] Reducing deep learning network structure through variable reduction methods in crop modeling

B Saravi, AP Nejadhashemi, P Jha, B Tang - Artificial Intelligence in …, 2021 - Elsevier
Crop models are widely used to predict plant growth, water input requirements, and yield.
However, existing models are very complex and require hundreds of variables to perform …