Low-latency federated learning with DNN partition in distributed industrial IoT networks

X Deng, J Li, C Ma, K Wei, L Shi… - IEEE Journal on …, 2022 - ieeexplore.ieee.org
Federated Learning (FL) empowers Industrial Internet of Things (IIoT) with distributed
intelligence of industrial automation thanks to its capability of distributed machine learning …

Scalable graph convolutional network training on distributed-memory systems

GV Demirci, A Haldar, H Ferhatosmanoglu - arXiv preprint arXiv …, 2022 - arxiv.org
Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs.
The large data sizes of graphs and their vertex features make scalable training algorithms …

Dynamic layer-wise sparsification for distributed deep learning

H Zhang, T Wu, Z Ma, F Li, J Liu - Future Generation Computer Systems, 2023 - Elsevier
Distributed stochastic gradient descent (SGD) algorithms are becoming popular in speeding
up deep learning model training by employing multiple computational devices (named …

D3-GNN: Dynamic Distributed Dataflow for Streaming Graph Neural Networks

R Guliyev, A Haldar, H Ferhatosmanoglu - arXiv preprint arXiv:2409.09079, 2024 - arxiv.org
Graph Neural Network (GNN) models on streaming graphs entail algorithmic challenges to
continuously capture its dynamic state, as well as systems challenges to optimize latency …

A lightweight self-supervised representation learning algorithm for scene classification in spaceborne SAR and optical images

X Xiao, C Li, Y Lei - Remote Sensing, 2022 - mdpi.com
Despite the increasing amount of spaceborne synthetic aperture radar (SAR) images and
optical images, only a few annotated data can be used directly for scene classification tasks …

Self-Compressing Neural Networks

S Cséfalvay, J Imber - arXiv preprint arXiv:2301.13142, 2023 - arxiv.org
This work focuses on reducing neural network size, which is a major driver of neural network
execution time, power consumption, bandwidth, and memory footprint. A key challenge is to …

Mapping and optimization method of SpMV on Multi-DSP accelerator

S Liu, Y Cao, S Sun - Electronics, 2022 - mdpi.com
Sparse matrix-vector multiplication (SpMV) solves the product of a sparse matrix and dense
vector, and the sparseness of a sparse matrix is often more than 90%. Usually, the sparse …

SpComm3D: A Framework for Enabling Sparse Communication in 3D Sparse Kernels

N Abubaker, T Hoefler - arXiv preprint arXiv:2404.19638, 2024 - arxiv.org
Existing 3D algorithms for distributed-memory sparse kernels suffer from limited scalability
due to reliance on bulk sparsity-agnostic communication. While easier to use, sparsity …

FSD-Inference: Fully Serverless Distributed Inference with Scalable Cloud Communication

J Oakley, H Ferhatosmanoglu - arXiv preprint arXiv:2403.15195, 2024 - arxiv.org
Serverless computing offers attractive scalability, elasticity and cost-effectiveness. However,
constraints on memory, CPU and function runtime have hindered its adoption for data …

Leveraging Memory Copy Overlap for Efficient Sparse Matrix-Vector Multiplication on GPUs

G Zeng, Y Zou - Electronics, 2023 - mdpi.com
Sparse matrix-vector multiplication (SpMV) is central to many scientific, engineering, and
other applications, including machine learning. Compressed Sparse Row (CSR) is a widely …