Napel: Near-memory computing application performance prediction via ensemble learning

G Singh, J Gómez-Luna, G Mariani… - Proceedings of the 56th …, 2019 - dl.acm.org
The cost of moving data between the memory/storage units and the compute units is a major
contributor to the execution time and energy consumption of modern workloads in …

Predicting the performance-cost trade-off of applications across multiple systems

A Nassereldine, S Diab, M Baydoun… - 2023 IEEE/ACM …, 2023 - ieeexplore.ieee.org
In modern computing environments, users may have multiple systems accessible to them
such as local clusters, private clouds, or public clouds. This abundance of choices makes it …

Enabling scalable and adaptive machine learning training via serverless computing on public cloud

A Ali, X Ma, S Zawad, P Aditya, IE Akkus, R Chen… - Performance …, 2025 - Elsevier
In today's production machine learning (ML) systems, models are continuously trained,
improved, and deployed. ML design and training are becoming a continuous workflow of …

Smlt: A serverless framework for scalable and adaptive machine learning design and training

A Ali, S Zawad, P Aditya, IE Akkus, R Chen… - arXiv preprint arXiv …, 2022 - arxiv.org
In today's production machine learning (ML) systems, models are continuously trained,
improved, and deployed. ML design and training are becoming a continuous workflow of …

Predicting cloud performance for HPC applications before deployment

G Mariani, A Anghel, R Jongerius… - Future Generation …, 2018 - Elsevier
To reduce the capital investment required to acquire and maintain a high performance
computing cluster, today many HPC users are moving to cloud. When deploying an …

AI gauge: Runtime estimation for deep learning in the cloud

P Dube, T Suk, C Wang - 2019 31st International Symposium …, 2019 - ieeexplore.ieee.org
Major cloud providers, including IBM Cloud, Amazon Web Services, Microsoft Azure, and
Google Cloud, offer services to train, debug, store, and deploy machine learning models at …

Nmpo: Near-memory computing profiling and offloading

S Corda, M Kumaraswamy, AJ Awan… - 2021 24th Euromicro …, 2021 - ieeexplore.ieee.org
Real-world applications are now processing big-data sets, often bottlenecked by the data
movement between the compute units and the main memory. Near-memory computing …

Distributed matrix multiplication performance estimator for machine learning jobs in cloud computing

M Son, K Lee - 2018 IEEE 11th International Conference on …, 2018 - ieeexplore.ieee.org
Matrix multiplication is an important kernel task in many machine learning algorithms. As the
size of input datasets increases, multiple workloads are analyzed in large-scale distributed …

Ndp-rank: Prediction and ranking of ndp systems performance using machine learning

V Iskandar, MA Abd El Ghany, D Goehringer - Microprocessors and …, 2023 - Elsevier
The near-data processing (NDP) paradigm has recently gained popularity as a promising
method for mitigating the memory wall challenges of future computing systems. Modern 3D …

LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning

G Singha, D Diamantopoulosb… - 2022 IEEE 40th …, 2022 - ieeexplore.ieee.org
Machine learning has recently gained traction as a way to overcome the slow accelerator
generation and implementation process on an FPGA. It can be used to build performance …