Archgym: An open-source gymnasium for machine learning assisted architecture design

S Krishnan, A Yazdanbakhsh, S Prakash… - Proceedings of the 50th …, 2023 - dl.acm.org
Machine learning (ML) has become a prevalent approach to tame the complexity of design
space exploration for domain-specific architectures. While appealing, using ML for design …

Probabilistic scheduling in high-level synthesis

J Cheng, J Wickerson… - 2021 IEEE 29th Annual …, 2021 - ieeexplore.ieee.org
High-level synthesis (HLS) tools automatically transform a high-level program, for example
in C/C++, into a low-level hardware description. A key challenge in HLS tools is scheduling …

PYTHIA: an oracle to guide runtime system decisions

A Colin, F Trahay, D Conan - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Runtime systems are commonly used by parallel applications in order to efficiently exploit
the underlying hardware resources. A runtime system hides the complexity of the …

Automatic tuning of tensorflow's cpu backend using gradient-free optimization algorithms

D Mebratu, N Hasabnis, P Mercati, G Sharma… - … Computing: ISC High …, 2021 - Springer
Modern deep learning (DL) applications are built using DL libraries and frameworks such as
TensorFlow and PyTorch. These frameworks have complex parameters and tuning them to …

Collective Mind, Part II: Towards performance-and cost-aware software engineering as a natural science

G Fursin, A Memon, C Guillon, A Lokhmotov - arXiv preprint arXiv …, 2015 - arxiv.org
Nowadays, engineers have to develop software often without even knowing which hardware
it will eventually run on in numerous mobile phones, tablets, desktops, laptops, data centers …

CompOFA: Compound once-for-all networks for faster multi-platform deployment

M Sahni, S Varshini, A Khare, A Tumanov - arXiv preprint arXiv …, 2021 - arxiv.org
The emergence of CNNs in mainstream deployment has necessitated methods to design
and train efficient architectures tailored to maximize the accuracy under diverse hardware & …

RFVP: Rollback-free value prediction with safe-to-approximate loads

A Yazdanbakhsh, G Pekhimenko, B Thwaites… - ACM Transactions on …, 2016 - dl.acm.org
This article aims to tackle two fundamental memory bottlenecks: limited off-chip bandwidth
(bandwidth wall) and long access latency (memory wall). To achieve this goal, our approach …

CrystalBall: Statically analyzing runtime behavior via deep sequence learning

S Zekany, D Rings, N Harada… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org
Understanding dynamic program behavior is critical in many stages of the software
development lifecycle, for purposes as diverse as optimization, debugging, testing, and …

Machine learning for run-time energy optimisation in many-core systems

D Biswas, V Balagopal, R Shafik… - … , Automation & Test …, 2017 - ieeexplore.ieee.org
In recent years, the focus of computing has moved away from performance-centric serial
computation to energy-efficient parallel computation. This necessitates run-time optimisation …

Multicore embedded worst-case task design issues and analysis using machine learning logic

S Aradhya, S Thejaswini, V Nagaveni - IOT with Smart Systems …, 2022 - Springer
In critical embedded systems especially hard real time systems, worst-case execution time
analysis is considered rather than expensive testing tools. In real time, most of the …