Band: coordinated multi-dnn inference on heterogeneous mobile processors

JS Jeong, J Lee, D Kim, C Jeon, C Jeong… - Proceedings of the 20th …, 2022 - dl.acm.org
The rapid development of deep learning algorithms, as well as innovative hardware
advancements, encourages multi-DNN workloads such as augmented reality applications …

Autofl: Enabling heterogeneity-aware energy efficient federated learning

YG Kim, CJ Wu - MICRO-54: 54th Annual IEEE/ACM International …, 2021 - dl.acm.org
Federated learning enables a cluster of decentralized mobile devices at the edge to
collaboratively train a shared machine learning model, while keeping all the raw training …

Autoscale: Energy efficiency optimization for stochastic edge inference using reinforcement learning

YG Kim, CJ Wu - 2020 53rd Annual IEEE/ACM international …, 2020 - ieeexplore.ieee.org
Deep learning inference is increasingly run at the edge. As the programming and system
stack support becomes mature, it enables acceleration opportunities in a mobile system …

Mandheling: Mixed-precision on-device dnn training with dsp offloading

D Xu, M Xu, Q Wang, S Wang, Y Ma, K Huang… - Proceedings of the 28th …, 2022 - dl.acm.org
This paper proposes Mandheling, the first system that enables highly resource-efficient on-
device training by orchestrating mixed-precision training with on-chip Digital Signal …

Empowering 1000 tokens/second on-device llm prefilling with mllm-npu

D Xu, H Zhang, L Yang, R Liu, G Huang, M Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
On-device large language models (LLMs) are catalyzing novel mobile applications such as
UI task automation and personalized email auto-reply, without giving away users' private …

Fusionai: Decentralized training and deploying llms with massive consumer-level gpus

Z Tang, Y Wang, X He, L Zhang, X Pan, Q Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
The rapid growth of memory and computation requirements of large language models
(LLMs) has outpaced the development of hardware, hindering people who lack large-scale …

Blastnet: Exploiting duo-blocks for cross-processor real-time dnn inference

N Ling, X Huang, Z Zhao, N Guan, Z Yan… - Proceedings of the 20th …, 2022 - dl.acm.org
In recent years, Deep Neural Network (DNN) has been increasingly adopted by a wide
range of time-critical applications running on edge platforms with heterogeneous …

Adaptivenet: Post-deployment neural architecture adaptation for diverse edge environments

H Wen, Y Li, Z Zhang, S Jiang, X Ye, Y Ouyang… - Proceedings of the 29th …, 2023 - dl.acm.org
Deep learning models are increasingly deployed to edge devices for real-time applications.
To ensure stable service quality across diverse edge environments, it is highly desirable to …

Efficient knowledge management for heterogeneous federated continual learning on resource-constrained edge devices

Z Yang, S Zhang, C Li, M Wang, H Wang… - Future Generation …, 2024 - Elsevier
Federated learning (FL) is a promising and privacy-preserving distributed learning method
that is widely deployed on edge devices. However, in practical applications, the data …

SLO-aware inference scheduler for heterogeneous processors in edge platforms

W Seo, S Cha, Y Kim, J Huh, J Park - ACM Transactions on Architecture …, 2021 - dl.acm.org
With the proliferation of applications with machine learning (ML), the importance of edge
platforms has been growing to process streaming sensor, data locally without resorting to …